When Claude Hits Its Limits: Building an AI-to-AI Escalation System

I hit a wall debugging a distributed system race condition. Claude Code had analyzed 30 files, but the bug spanned microservices with gigabytes of traces. Claude is excellent at surgical code edits. Correlating thousands of trace spans across services requires a model that can hold all that context at once.

So I built an escalation system. Claude calls Gemini when it needs the 1M token context window. Gemini returns structured analysis. Claude uses it to make the fix.

The Architecture

The Model Context Protocol (MCP) made this straightforward. An MCP server sits between Claude Code and Gemini. When Claude recognizes it needs help -- too many files, too much trace data, cross-service correlation -- it calls the escalation tool. The server packages relevant context and routes it to Gemini. Gemini returns structured findings. Claude acts on them.

The interesting addition is conversational tools. Instead of one-shot analysis, Claude and Gemini engage in multi-turn dialogues. Claude asks follow-ups based on Gemini's responses. The models build on each other's insights in ways single-shot analysis can't achieve.

What I Learned

Context preparation is 80% of the work. The hardest part isn't the API calls. It's deciding which files, traces, and logs are actually relevant. Too little context and the analysis fails. Too much and you waste tokens and confuse the model. Most development time went into intelligent context selection.

The escalation heuristic matters. Not every problem needs Gemini. The MCP server uses thresholds: file count over 50, trace size over 100MB, more than 3 services involved, more than 5 failed approaches. Getting this wrong in either direction is expensive -- unnecessary escalations waste money, missed escalations waste developer hours.

Multi-turn beats single-shot. Single-shot analysis misses nuances. The conversational approach lets models build iteratively, leading to discoveries neither would make alone. This was the biggest surprise -- multi-turn capability was more valuable than raw context window size.

It's not magic. Sometimes the analysis is wrong. Sometimes the context preparation misses the relevant data. Sometimes the bug is in the one file you didn't include. This is a tool that improves your odds, not a guarantee.

The Bigger Point

Treat LLMs like heterogeneous compute resources. Route tasks to the model best equipped to handle them, the same way you'd pick the right database for the right query pattern. Claude for code edits and reasoning. Gemini for massive context analysis. Smaller models for simple classification tasks.

In two years, developers will orchestrate multiple specialized models routinely. The question is whether you're building the orchestration layer now or waiting for someone else to build it for you.

The deep-code-reasoning-mcp server is open source.

The Architecture

What I Learned

The Bigger Point

In two years, developers will orchestrate multiple specialized models routinely. The question is whether you're building the orchestration layer now or waiting for someone else to build it for you.