Why I Put Claude in Jail and Let Him Code Anyway

From vibe coding chaos to production-ready apps: how containerized orchestration makes Claude AI reliable.

Aug 28, 2025

From Snippets to Scalpel

Early Encounters with AI CodeGen

After my first real use of AI for code generation, I knew I was hooked. It was early days for public LLM usage and one of my first sessions was with ChatGPT 3.5. I wanted to see how well it could generate something complicated - a TCP/IP server that could handle thousands of concurrent users.

This is an arbitrary problem, but it’s complex. There are multiple threads, tightly controlled memory constraints and a lot of synchronization. There is a big gap between accepting a handful of concurrent connections and managing a service of this scale. I knew it could easily generate a “Hello World”, but could it solve real-world problems?

In short, no.

It understood the problem. It wrote some code. But the problem was too big for GPT 3.5’s context window. It would forget variables mid-stream, overflow its own context window, and start hallucinating references that didn’t exist.

It was great at code snippets, even complex ones, but not putting systems together.

Nonetheless, it was a glimpse into the capabilities – complex code writing was in there somewhere.

Over the next year I continued to watch the evolution of code generation.

By late 2024, AI could write production-ready chunks – if you babysat it like a nervous intern. A huge leap from 18 months earlier, but still miles from the “just tell it what you want” dream.

AI Coding in my Workflow

I’ve started to use AI in my own workflows.

I use Claude CLI as I’ve found it to be the most consistently reliable flow for how I work with it. It’s quick, works well and my natural language structures seem to align well with prompting styles that get the most out of it.

I rarely let it write code for me. I feed it ideas, work out ideas that I need to refine and then I go away and do it. I do trust it enough to review my code and help with bugs, but that’s it.

Why don’t I let it code for me? Claude is great at writing code, but I’m still faster at the kind of complex codebases I’m working in. If I was a junior-mid developer working on a simple SaaS app, then it could handle the job fine. But trying to write a compiler? Developing AI orchestration tools itself? That’s a different game.

Despite this, the rate of improvement in AI code generation has been staggering over the last 18 months. Claude AI has come up with some genuinely novel approaches that I’m not sure many humans could have solved.

How do you terminate child processes that were created by a process spawned earlier and now dead? It needs to work on Windows and Linux. You can’t run with elevated permissions, and you can’t just terminate everything from the user. You don’t know what processes will be spawned and you don’t know what other processes should be left running.

Claude AI and I worked on this for a little bit, and the solution it eventually proposed was genuinely novel. Set an environment variable in the parent. When you want to tidy-up find processes that have inherited that variable and kill them. In Windows you scan the PEB headers of all processes, in Linux you just enumerate the /proc filesystem. Claude even wrote some pretty good Windows code that handled both 64- and 32-bit awareness.

This blew my mind. Maybe AI coding was finally better than I thought it was?

Vibe Coding and the Descent into Chaos

The Vibe Coding Experience

It was 2am. No sleep, high on recent process termination wins with Claude AI and far too much Monster Ultra. Naturally, I decided to make a point-and-click adventure game from scratch with Claude.

So… I did the unthinkable. I thought why not try vibe coding rather than just use Claude as a tool. Little did I know the chaos that would ensue.

I’ve never developed a game so I thought this would be a fun experience and a useful test. I couldn’t guide it, I had no idea how it would implement the code especially since I knew exactly what I wanted. I wanted to create a Monkey-Island style game based off my own story line and assets.

I loaded Claude AI.

We talked for a bit. I wanted a game. SCRUM style. It was going to be awesome. Maybe the best game ever. I gave it the story, character and key outlines.

We were hyped. United in vision. This was going to be amazing.

“Let’s do this! Get going!” I commanded, fuelled on Monster Ultra.

We took some sample assets I dropped into a folder. I gave it a quick console app it could call to generate more assets as required.

We had a hyped, united in vision.

It coded, for quite a while. It was so excited. Emojis galore.

I was excited to see what came out.

A HTML page with a yellow rectangle.

“NO! I shouted. “This isn’t a game! Where are the visuals!”

I probably should have said I wanted it to be created in an actual game engine with real assets and not just a HTML canvas, but I was so excited by the idea the technical details didn’t even cross my mind.

I installed Unity and off we went again with new directions. It wouldn’t work. No keyboard input. Why? I had no idea – I’ve never used Unity before and I’m vibe coding here.

“FIX IT”, I’d shout. Claude AI would dutifully tell me it had absolutely fixed it. It had not.

We looped here, a lot. It kept creating scenes, and it kept failing. Unity was not Claude’s thing.

I gave up, until I moved onto GDI in C# – then I could help a little.

The context was compacted dozens of times; all the details were lost to the ether. Its own ability to generate images forgotten, the core mission long lost, dozens of failed projects. I could have guided it’s C# attempts, but that wasn’t the point.

Probably two hours elapsed from start to finish before I gave up.

My vibe coding journey finished in the early hours of the morning, and I went to bed feeling dejected.

What I Learned from Vibe Coding

A common topic on Reddit is “what I learned from vibe coding” – usually followed by an AI generated post that fundamentally says, “it can’t do it unless you know how to tell it exactly what to do.”

The reality though? Claude is exceptional. When given direction.

I’ve coded almost every day for 34 years, since I was 7, and using my Commodore 64. I have a huge amount of experience to draw on. Claude has a huge amount of knowledge to draw on. Those are different things.

When directed like a scalpel, Claude is an amazing tool. Maybe the greatest tool I’ve used in years.

When given free reign it was absolute insane chaos.

AI assisted code generation has come a long way since 2023, but without that targeted direction, it cannot deliver a shippable production-ready product by itself.

Other less grandiose vibe-coding tests have resulted in backends with hard-coded secrets, no use of configuration files, database layers sitting directly in service tiers, it becomes a mess real fast. Authentication? Eek.

High-level requirements like, “make me a secure website” do not translate into small low-level actions that Claude can work out by itself. But if you give it that detail, it can.

This got me thinking.

What if Claude isn’t the problem? What if all I needed to do was come up with something that could automate the part of my brain that takes those high-level requirements into low-level prompting that Claude excels at?

Why I Put Claude in Jail

Sorry Claude, it’s Time for Prison

I did the only sensible thing. I put Claude in a container; stripped it of freedom and built an AI orchestration pipeline around it to imitate how I prompt it for better outputs.

In short: I put him in jail.

To get the best from Claude AI, I built an automated orchestration system around it. I needed to create a pipeline to help get the best from Claude. Since we’re building a massive AI orchestration platform internally this felt like a great opportunity to test it.

I started by creating a container that the platform could spin up upon request that contained Claude and allowed our AI workflows to interact with it. The container included a full Linux development environment and the ability to attach a filesystem that includes the codebase. Within the container, the orchestrator could prompt Claude step-by-step – much like I do when using it as a tool.

First, Claude is asked to describe the codebase in huge detail for an agent that doesn’t have access to it.

The orchestrator then provides it to an Open AI chain-of-thought agent tasked with breaking down the top-level problem into smaller actionable chunks.

“Create a login page” turns into token generation, configuration files, roles, claims, browser handling and caching.

Once broken down, a new Claude container spins up, analyses the codebase, and walks through each task in an analyze-code-fix cycle. We finish this up with a final double check step following the same 3-phase mechanism.

If the analysis agent decides this was 20 small steps, that means we’re prompting Claude through 64 prompts to get the output instead of just one.

When it’s finished, we then create a third and final Claude container and get that instance to analyze the codebase and output a detailed pull request in JSON. The orchestrator then translates this into a GitHub pull request for user review. We can then either repeat the cycle from PR feedback (including human-in-the-loop), or we can take the code as is.

This is a prison for Claude.

No scope creep. No surprise ideas. Just step-by-step execution.

There are no Git credentials attached to let push-chaos reign supreme.

There is no unapproved network connectivity.

Once the session is finished, the entire container is thrown away and Claude’s chaos is gone with it.

Clean, focused, secure. Free of chaos.

And surprisingly effective.

It Works!

This works surprisingly well. What started as a little test has ended up being able to deliver high quality, secure code.

We use this to develop user stories. No vibe coding “create me a global SaaS portfolio” – but clearly defined and well written requirements that can be broken down. This is, ultimately, how human developers work best too.

During my tests, it can take 15-20 minutes of Claude AI’s time to develop an average story. That’s longer than if I just used a vibe coding agent – but the difference is that the code is good enough to use. A longer first interaction provides a much quicker overall time to deliver real value.

In short – can you create production-quality code with Claude in 2025?

Yes. All it takes is an entire AI orchestration platform, a series of docker containers, a significant Kubernetes architecture to run it on, multiple AI models across several providers, GitHub integration and a team of business analysts to feed well written stories.

We’ve been working on building a full end-to-end suite of tools to help our own development teams in-house and we’ve baked the Claude support within our own cycles. If you’re interested in testing it out, let me know as we’ll be sharing it in a limited beta in a few weeks’ time.

Claude’s not your co-founder. He’s your overconfident intern with a somewhat poor memory. Give him clear steps. Supervise him closely. And never ever let him near production unsupervised. Remember that and you’ll ship

Guy’s Substack

Discussion about this post