Jonathan Haaswritingnowusesabout
emailgithubx
Jonathan Haaswritingnowusesabout

Beyond Simple Prompts: Production-Grade LLM Techniques with DSPy

June 25, 2025·2 min read

The best AI companies don't write prompts by hand. They generate them programmatically, test them systematically, and optimize them continuously.

#ai#llm#dspy#prompting#open-source

The best AI companies don't write prompts by hand. They generate them programmatically.

Manual prompt engineering is writing assembly. It works, but it doesn't scale. When you're handling thousands of edge cases across multiple models, you need a compiler. DSPy is that compiler. You define what you want (signatures), and the framework optimizes how to get it.

After building dspy-advanced-prompting -- implementations of 11 prompting strategies -- here's what actually matters in production.

The Techniques That Work

Manager-style prompts. Top startups aren't giving LLMs one-line instructions. They're writing 6-page onboarding documents -- role definitions, stakeholder relationships, decision-making frameworks, performance expectations. The more context upfront, the less correction later.

Escape hatches. Hallucination isn't a bug you fix. It's what happens when you don't give the model an exit route. Building explicit uncertainty handling into every prompt -- confidence scoring, domain-specific disclaimers, graceful degradation -- dramatically reduces confident-but-wrong answers.

Meta-prompting. Instead of you optimizing prompts, the LLM optimizes its own prompts based on performance metrics. Using genetic algorithms and fitness functions, prompts evolve over iterations. This has the highest leverage of any technique -- set up the optimization loop once and it compounds.

Dynamic few-shot selection. Everyone knows about few-shot prompting. What most teams miss: the best companies maintain libraries of thousands of examples, dynamically selected based on semantic similarity to the input. The right examples matter more than the right instructions.

The Actual Production Lesson

After implementing all 11 techniques, the biggest finding: your test suite is more valuable than your prompts.

Every technique includes evaluation because "it seems to work" isn't production-grade. You need test cases that define expected behavior, must-include criteria, and failure conditions. Without that, you're optimizing blind.

The other production secret: model distillation. Develop and validate on a frontier model (Claude, GPT-4), then deploy on smaller, cheaper models. This cuts inference costs by 10-50x while maintaining quality -- but only if you have the test suite to verify the distilled version still performs.

What This Means

The companies getting strong results from AI aren't prompt engineering wizards. They're software engineers who treat prompts like code -- with version control, testing, optimization pipelines, and observability.

The code is at dspy-advanced-prompting. Start treating your prompts like production software.

share

Continue reading

OCode: Why I Built My Own Claude Code (and Why You Might Too)

OCode: Why I Built My Own Claude Code (and Why You Might Too): A few nights ago, I opened my Anthropic invoice.

Two Minds in the Machine: Shared Context Is the Only Thing That Matters

I added Gemini to a codebase that already had Claude embedded. The useful discovery was about shared context files, not model capabilities.

AI Detection Hysteria: When Human Creativity Gets Mislabeled

A photographer friend posted a sunset photo after three hours of waiting for the perfect light. Within minutes: 'Obvious Midjourney.' 'Nice prompt, bro.'

emailgithubx