{"componentChunkName":"component---src-templates-post-js","path":"/the-cat-sandwich-gif-that-wasnt/","result":{"data":{"markdownRemark":{"html":"<h1>Bambi The Cat, the Sandwich, and the GIF That Wasn't</h1>\n<p><em>Or: Simply make me a GIF, Claude.</em></p>\n<p><img src=\"images/bambi/bambi1.jpg\" alt=\"Bambi materializes on the couch\"></p>\n<p>I have seven photos of my orange cat, Bambi, staging a slow-motion heist on an Ike's sandwich. The arc is perfect: she materializes on the couch, creeps onto the coffee table, leans in, and finally squares up nose-to-bread with the loot. It's begging to be a GIF.</p>\n<p><img src=\"images/bambi/bambi2.jpg\" alt=\"Bambi creeps onto the coffee table\"></p>\n<p>So I asked Claude Cowork to make one. What followed was a quietly fascinating failure — the kind that tells you more about how these tools actually work than any polished demo could.</p>\n<p><img src=\"images/bambi/bambi3.jpg\" alt=\"Bambi leans in\"></p>\n<h2>The prompt</h2>\n<blockquote>\n<p><strong>Me:</strong> could you create a gif from these images</p>\n</blockquote>\n<p>Seven photos attached. Clear intent. The kind of task a ten-line Python script with Pillow could knock out in under a second.</p>\n<p><img src=\"images/bambi/bambi4.jpg\" alt=\"Bambi squares up nose-to-bread\"></p>\n<h2>The first wall</h2>\n<p>Claude — running inside Cowork with a full Linux sandbox, file tools, Python, ffmpeg, the works — went looking for the images and came back empty-handed:</p>\n<blockquote>\n<p><strong>Claude:</strong> The images appear in our chat but weren't attached as files I can access on disk… Could you re-upload the 7 photos as attachments (drag them into the chat)? Once they land in my uploads folder I'll stitch them into a GIF.</p>\n</blockquote>\n<p>This was already weird to me. <em>You're literally looking at the photos right now.</em> They were rendered in the conversation. Claude was describing them back to me (\"cat spotting sandwich → approaching → on table → settled\"). How can you see them and not have them?</p>\n<h2>The debug loop</h2>\n<p>I re-sent the photos. Same response.</p>\n<blockquote>\n<p><strong>Me:</strong> i did both of those thingd</p>\n<p><strong>Claude:</strong> I hear you, but my uploads folder is genuinely still empty on my end — <code>/mnt/uploads/</code> has no files… The images render inline in our chat so I can see them, but I can't extract the pixel data from that view; I need them as actual attached files.</p>\n</blockquote>\n<p>That sentence is the whole story: <strong>\"I can see them, but I can't extract the pixel data from that view.\"</strong></p>\n<p>It sounds like a contradiction. It isn't. It's architecture.</p>\n<h2>Why this happens</h2>\n<p>When you drop an image into a chat with Claude, two very different things <em>can</em> happen:</p>\n<ol>\n<li><strong>The image is encoded as vision tokens and streamed into the model's context.</strong> Claude \"sees\" it the way you see a memory — as a perception, not a file. There are no bytes to grab, no PNG header to parse, no array of pixels to hand to Pillow. It's been pre-digested into the embedding space.</li>\n<li><strong>The image is saved as a file on the sandbox disk</strong> (e.g., <code>/mnt/uploads/cat1.jpg</code>). <em>Now</em> tools like Python, ffmpeg, and ImageMagick can touch it.</li>\n</ol>\n<p>Creating a GIF requires path #2. The image has to exist as bytes that code can read, resize, and encode. Vision tokens can't be piped into ffmpeg — they're a different substance entirely.</p>\n<p>In our conversation, path #1 was happening and path #2 wasn't. Claude kept running <code>ls /mnt/uploads/</code> and getting:</p>\n<blockquote>\n<pre><code>total 4\ndr-x------ 2 loving-zealous-fermi ... .\ndr-xr-x 8 loving-zealous-fermi ... ..\n</code></pre>\n</blockquote>\n<p>An empty folder, every time. It had a gallery in its mind's eye and a bare cupboard in its hands.</p>\n<h2>The \"seemingly simple prompt\" problem</h2>\n<p>This is what makes the example useful. \"Make a GIF from these images\" sounds like the most basic request in the world — the kind of thing an intern does between coffee sips. And Claude is <em>wildly</em> capable. It can write the encoder from scratch, reason about frame timing, even draft this very blog post. What it couldn't do was the one thing the task actually needed: touch the bytes.</p>\n<p>The gap between what an AI appears to be able to do and what it can actually reach through its tools is a recurring theme in agentic systems. You see glossy demos of AIs \"using a computer,\" and then in real life you hit a mundane plumbing issue — an upload path that doesn't connect to the sandbox mount, or a file handler that sends images down the vision pipeline instead of the filesystem one — and the whole thing grinds to a halt.</p>\n<p>The <em>intelligence</em> isn't the bottleneck. The <em>connective tissue</em> is.</p>\n<h2>The debug conversation, preserved</h2>\n<p>For posterity, the exchange that gets to the heart of it:</p>\n<blockquote>\n<p><strong>Me:</strong> could you create a gif from these images</p>\n<p><strong>Claude:</strong> [checks <code>/mnt/uploads/</code>, finds nothing] The images appear in our chat but weren't attached as files I can access on disk.</p>\n<p><strong>Me:</strong> here [re-uploads]</p>\n<p><strong>Claude:</strong> The uploads folder is still empty on my side — <code>/mnt/uploads/</code> has no files. The images render inline in our chat so I can see them, but I can't extract the pixel data from that view; I need them as actual attached files.</p>\n<p><strong>Me:</strong> i did both of those thingd</p>\n<p><strong>Claude:</strong> [still empty] I hear you, but my uploads folder is genuinely still empty on my end.</p>\n</blockquote>\n<p>Three rounds. Same answer. Not because Claude was being stubborn — because the file genuinely wasn't there. It was a <em>correctly-functioning</em> tool honestly reporting its environment. The failure was upstream: somewhere between my upload button and the sandbox mount, the handoff wasn't happening.</p>\n<h2>What I learned</h2>\n<p>A few takeaways from this absurd little saga:</p>\n<ul>\n<li><strong>Vision and filesystem are different worlds.</strong> An AI \"seeing\" your image and an AI being able to <em>process</em> your image are not the same capability. If you want bytes-level work — GIFs, resizing, OCR, metadata extraction — the file has to live on disk.</li>\n<li><strong>\"Simple\" prompts are often load-bearing on invisible infrastructure.</strong> The difficulty of a task for an AI agent isn't correlated with how simple it sounds to a human.</li>\n<li><strong>Honest failure is a feature.</strong> Claude didn't hallucinate a GIF. It didn't pretend. It kept saying \"the folder is empty\" and suggesting workarounds. That's the right behavior, even when it's frustrating.</li>\n<li><strong>Debugging an AI is debugging a system.</strong> The model is one component. The sandbox, the tool bindings, the upload pipeline, and the UI are all equally capable of being the broken link.</li>\n</ul>\n<h2>The cat, meanwhile</h2>\n<p><img src=\"images/bambi/bambi5.jpg\" alt=\"Bambi unbothered\"></p>\n<p><img src=\"images/bambi/bambi6.jpg\" alt=\"Bambi unbothered, part 2\"></p>\n<p><img src=\"images/bambi/bambi7.jpg\" alt=\"Bambi unbothered, part 3\"></p>\n<p>…is still staring at me from across the coffee table, entirely unbothered that her highlight reel remains un-animated. Maybe that's the real moral: the only party in this whole affair who's actually succeeded at anything is the cat.</p>\n<p>I'll try the GIF again tomorrow. If it works, I'll post it below. If it doesn't, at least I've got this essay.</p>","frontmatter":{"title":"Bambi The Cat, the Sandwich, and the GIF That Wasn't","date":"01/01/2024","excerpt":"Asking an AI to make a GIF sounds simple. Why did it take seven photos of my cat, three failed attempts, and an essay to figure out what went wrong?"}}},"pageContext":{"slug":"/the-cat-sandwich-gif-that-wasnt/"}},"staticQueryHashes":["3943644438"]}