ai systems.
my approach to building production AI systems, from DSPy frameworks to multi-model orchestration. these aren't just experiments—they're battle-tested tools solving real problems.
projects
Comprehensive 0-to-1 guide for building self-improving LLM applications
Complete guide with 154+ stars, teaching DSPy framework from basics to production deployment
Production-grade LLM techniques using DSPy framework
Comprehensive implementation of 11 state-of-the-art prompting techniques with 200+ test cases
Multi-agent system for detecting and resolving cognitive dissonance
Advanced LLM system with 263 stars that identifies contradictions in reasoning and attempts resolution
Minimal agent runtime built with DSPy modules
Lightweight agent framework with CLI, FastAPI server, and eval harness supporting OpenAI/Ollama
Blazing-fast security scanner for AI/LLM usage in codebases
Rust-based scanner that detects vulnerabilities, enforces budgets, and audits AI implementations
Composable code review engine for automated diff analysis
Rust-based engine that provides intelligent code review insights with 10 stars
Ultra-fast code debt detection library and CLI
Rust CLI tool for detecting technical debt patterns with 9 stars
Complete framework for LLM evaluation and benchmarking
Production-ready evaluation framework with custom benchmarks and model-graded evaluation capabilities
Advanced LLM evaluation with multi-critic deliberation protocols
Framework with OWASP LLM Top 10 assessment and synthetic test generation for AI safety
Circuit breaker pattern for LLM output monitoring
Output monitoring with budgets, verifiers, and DSPy integration for production LLM systems
High-performance microVM sandbox for untrusted code execution
Secure sandbox using Firecracker microVMs with 125ms boot time for running LLM-generated code
Lightweight, secure sandboxed command execution for AI agents
Rust-based sandbox specifically designed for secure AI agent command execution
Rust-native CLI assistant for understanding codebases in natural language
Command-line tool with 3 stars that helps developers navigate and understand large codebases
Multi-model AI system with Claude-to-Gemini escalation
MCP server enabling Claude to escalate complex analysis tasks to Gemini 2.5 Pro for deeper reasoning
MCP server for comprehensive visual design analysis
Analyzes composition, color harmony, typography, and accessibility compliance with 3 stars
DSPy-powered email optimization for startup founders
AI system with 30 stars that optimizes outreach emails by learning from successful examples
Brutally honest startup advisor you can text or run from CLI
YC-style advisor with 10 stars providing opinionated advice and financial tools for founders
AI-powered email management with TUI dashboard and multi-agent system
Complete email automation with Gmail integration, rule-based processing, and Docker deployment
AI-powered life logging through intelligent email summaries
Captures daily moments using Limitless API and generates meaningful email summaries of life events
techniques
Manager-Style Hyper-Specific Prompts
Detailed, context-rich prompts that eliminate ambiguity
Chain of Thought with Self-Consistency
Multi-path reasoning with consensus-based answers
Tree of Thoughts
Structured exploration of solution spaces
ReAct (Reasoning + Acting)
Interleaved reasoning and action for complex tasks
Few-Shot with Dynamic Examples
Context-aware example selection for better performance
Model-Graded Evaluation
Using LLMs to evaluate LLM outputs with custom rubrics
Multi-Model Orchestration
Coordinating different models for optimal results
Semantic Code Analysis
Understanding code meaning beyond syntax patterns
development stack
Claude Code
AI coding assistant with file system access for complex implementations and architecture decisions
My go-to for surgical code edits, creative problem-solving, and understanding existing codebases
Cursor
VS Code-based IDE with codebase-aware AI completions and semantic search
Daily coding with AI pair programming and intelligent code suggestions
DSPy Framework
Programmatic approach to prompt engineering with automatic optimization
Building production LLM systems that scale beyond manual prompt crafting
Multi-Model Systems
Using different models for their strengths (Claude for code, Gemini for analysis)
Complex tasks requiring different capabilities get routed to optimal models
philosophy
evaluation first—you can't improve what you can't measure. every AI system needs rigorous evaluation frameworks, custom benchmarks, and real-world testing before deployment.
production-ready from day one. demos that work in isolation rarely work at scale. i build with reliability, error handling, and monitoring as first-class concerns.
systematic optimization through frameworks like DSPy. manual prompt tweaking doesn't scale—algorithmic optimization makes systems that improve automatically.
security by design. AI systems handle sensitive data and generate code. security isn't an add-on—it's built into the architecture from the ground up.
related writing
why manual prompt engineering is fundamentally wrong and how DSPy changes everything
11 battle-tested techniques for building LLM systems that actually work in production
how to create evaluation frameworks that actually predict real-world performance