AI Systems

← back home

Notes on how I build production AI systems, from DSPy frameworks to multi-model orchestration. The focus is practical: tools and patterns that hold up in real use.

projects

dspy-0to1-guide

Comprehensive 0-to-1 guide for building self-improving LLM applications

Complete guide with 154+ stars, teaching DSPy framework from basics to production deployment

dspy-advanced-prompting

Production-grade LLM techniques using DSPy framework

Comprehensive implementation of 11 state-of-the-art prompting techniques with 200+ test cases

cognitive-dissonance-dspy

Multi-agent system for detecting and resolving cognitive dissonance

Advanced LLM system with 263 stars that identifies contradictions in reasoning and attempts resolution

dspy-micro-agent

Minimal agent runtime built with DSPy modules

Lightweight agent framework with CLI, FastAPI server, and eval harness supporting OpenAI/Ollama

aiscan

Blazing-fast security scanner for AI/LLM usage in codebases

Rust-based scanner that detects vulnerabilities, enforces budgets, and audits AI implementations

diffscope

Composable code review engine for automated diff analysis

Rust-based engine that provides intelligent code review insights with 10 stars

codedebt

Ultra-fast code debt detection library and CLI

Rust CLI tool for detecting technical debt patterns with 9 stars

ai-eval-toolkit

Complete framework for LLM evaluation and benchmarking

Production-ready evaluation framework with custom benchmarks and model-graded evaluation capabilities

llm-tribunal

Advanced LLM evaluation with multi-critic deliberation protocols

Framework with OWASP LLM Top 10 assessment and synthetic test generation for AI safety

circuit-breaker-llm

Circuit breaker pattern for LLM output monitoring

Output monitoring with budgets, verifiers, and DSPy integration for production LLM systems

Fission

High-performance microVM sandbox for untrusted code execution

Secure sandbox using Firecracker microVMs with 125ms boot time for running LLM-generated code

capsule-run

Lightweight, secure sandboxed command execution for AI agents

Rust-based sandbox specifically designed for secure AI agent command execution

buildli

Rust-native CLI assistant for understanding codebases in natural language

Command-line tool with 3 stars that helps developers navigate and understand large codebases

deep-code-reasoning-mcp

Multi-model AI system with Claude-to-Gemini escalation

MCP server enabling Claude to escalate complex analysis tasks to Gemini 2.5 Pro for deeper reasoning

design-critique-mcp

MCP server for comprehensive visual design analysis

Analyzes composition, color harmony, typography, and accessibility compliance with 3 stars

founder-email-optimizer

DSPy-powered email optimization for startup founders

AI system with 30 stars that optimizes outreach emails by learning from successful examples

orbit-agent

Brutally honest startup advisor you can text or run from CLI

YC-style advisor with 10 stars providing opinionated advice and financial tools for founders

email-agent

AI-powered email management with TUI dashboard and multi-agent system

Complete email automation with Gmail integration, rule-based processing, and Docker deployment

lifelog-email

AI-powered life logging through intelligent email summaries

Captures daily moments using Limitless API and generates meaningful email summaries of life events

techniques

Manager-Style Hyper-Specific Prompts

Detailed, context-rich prompts that eliminate ambiguity

Chain of Thought with Self-Consistency

Multi-path reasoning with consensus-based answers

Tree of Thoughts

Structured exploration of solution spaces

ReAct (Reasoning + Acting)

Interleaved reasoning and action for complex tasks

Few-Shot with Dynamic Examples

Context-aware example selection for better performance

Model-Graded Evaluation

Using LLMs to evaluate LLM outputs with custom rubrics

Multi-Model Orchestration

Coordinating different models for optimal results

Semantic Code Analysis

Understanding code meaning beyond syntax patterns

development stack

Claude Code

AI coding assistant with file system access for complex implementations and architecture decisions

My go-to for surgical code edits, creative problem-solving, and understanding existing codebases

Cursor

VS Code-based IDE with codebase-aware AI completions and semantic search

Daily coding with AI pair programming and intelligent code suggestions

DSPy Framework

Programmatic approach to prompt engineering with automatic optimization

Building production LLM systems that scale beyond manual prompt crafting

Multi-Model Systems

Using different models for their strengths (Claude for code, Gemini for analysis)

Complex tasks requiring different capabilities get routed to optimal models

philosophy

evaluation first—you can't improve what you can't measure. every AI system needs rigorous evaluation frameworks, custom benchmarks, and real-world testing before deployment.

production-ready from day one. demos that work in isolation rarely work at scale. i build with reliability, error handling, and monitoring as first-class concerns.

systematic optimization through frameworks like DSPy. manual prompt tweaking doesn't scale—algorithmic optimization makes systems that improve automatically.

security by design. AI systems handle sensitive data and generate code. security isn't an add-on—it's built into the architecture from the ground up.

why manual prompt engineering is fundamentally wrong and how DSPy changes everything

Beyond Simple Prompts: Production-Grade LLM Techniques

11 battle-tested techniques for building LLM systems that actually work in production

Building Better AI Evals: A Practical Guide

how to create evaluation frameworks that actually predict real-world performance

projects

dspy-0to1-guide

Comprehensive 0-to-1 guide for building self-improving LLM applications

Complete guide with 154+ stars, teaching DSPy framework from basics to production deployment

dspy-advanced-prompting

Production-grade LLM techniques using DSPy framework

Comprehensive implementation of 11 state-of-the-art prompting techniques with 200+ test cases

cognitive-dissonance-dspy

Multi-agent system for detecting and resolving cognitive dissonance

Advanced LLM system with 263 stars that identifies contradictions in reasoning and attempts resolution

dspy-micro-agent

Minimal agent runtime built with DSPy modules

Lightweight agent framework with CLI, FastAPI server, and eval harness supporting OpenAI/Ollama

aiscan

Blazing-fast security scanner for AI/LLM usage in codebases

Rust-based scanner that detects vulnerabilities, enforces budgets, and audits AI implementations

diffscope

Composable code review engine for automated diff analysis

Rust-based engine that provides intelligent code review insights with 10 stars

codedebt

Ultra-fast code debt detection library and CLI

Rust CLI tool for detecting technical debt patterns with 9 stars

ai-eval-toolkit

Complete framework for LLM evaluation and benchmarking

Production-ready evaluation framework with custom benchmarks and model-graded evaluation capabilities

llm-tribunal

Advanced LLM evaluation with multi-critic deliberation protocols

Framework with OWASP LLM Top 10 assessment and synthetic test generation for AI safety

circuit-breaker-llm

Circuit breaker pattern for LLM output monitoring

Output monitoring with budgets, verifiers, and DSPy integration for production LLM systems

Fission

High-performance microVM sandbox for untrusted code execution

Secure sandbox using Firecracker microVMs with 125ms boot time for running LLM-generated code

capsule-run

Lightweight, secure sandboxed command execution for AI agents

Rust-based sandbox specifically designed for secure AI agent command execution

buildli

Rust-native CLI assistant for understanding codebases in natural language

Command-line tool with 3 stars that helps developers navigate and understand large codebases

deep-code-reasoning-mcp

Multi-model AI system with Claude-to-Gemini escalation

MCP server enabling Claude to escalate complex analysis tasks to Gemini 2.5 Pro for deeper reasoning

design-critique-mcp

MCP server for comprehensive visual design analysis

Analyzes composition, color harmony, typography, and accessibility compliance with 3 stars

founder-email-optimizer

DSPy-powered email optimization for startup founders

AI system with 30 stars that optimizes outreach emails by learning from successful examples

orbit-agent

Brutally honest startup advisor you can text or run from CLI

YC-style advisor with 10 stars providing opinionated advice and financial tools for founders

email-agent

AI-powered email management with TUI dashboard and multi-agent system

Complete email automation with Gmail integration, rule-based processing, and Docker deployment

lifelog-email

AI-powered life logging through intelligent email summaries

Captures daily moments using Limitless API and generates meaningful email summaries of life events

techniques

Manager-Style Hyper-Specific Prompts

Detailed, context-rich prompts that eliminate ambiguity

Chain of Thought with Self-Consistency

Multi-path reasoning with consensus-based answers

Tree of Thoughts

Structured exploration of solution spaces

ReAct (Reasoning + Acting)

Interleaved reasoning and action for complex tasks

Few-Shot with Dynamic Examples

Context-aware example selection for better performance

Model-Graded Evaluation

Using LLMs to evaluate LLM outputs with custom rubrics

Multi-Model Orchestration

Coordinating different models for optimal results

Semantic Code Analysis

Understanding code meaning beyond syntax patterns

development stack

Claude Code

AI coding assistant with file system access for complex implementations and architecture decisions

My go-to for surgical code edits, creative problem-solving, and understanding existing codebases

Cursor

VS Code-based IDE with codebase-aware AI completions and semantic search

Daily coding with AI pair programming and intelligent code suggestions

DSPy Framework

Programmatic approach to prompt engineering with automatic optimization

Building production LLM systems that scale beyond manual prompt crafting

Multi-Model Systems

Using different models for their strengths (Claude for code, Gemini for analysis)

Complex tasks requiring different capabilities get routed to optimal models

philosophy

evaluation first—you can't improve what you can't measure. every AI system needs rigorous evaluation frameworks, custom benchmarks, and real-world testing before deployment.

production-ready from day one. demos that work in isolation rarely work at scale. i build with reliability, error handling, and monitoring as first-class concerns.

systematic optimization through frameworks like DSPy. manual prompt tweaking doesn't scale—algorithmic optimization makes systems that improve automatically.

security by design. AI systems handle sensitive data and generate code. security isn't an add-on—it's built into the architecture from the ground up.

why manual prompt engineering is fundamentally wrong and how DSPy changes everything

Beyond Simple Prompts: Production-Grade LLM Techniques

11 battle-tested techniques for building LLM systems that actually work in production

Building Better AI Evals: A Practical Guide

how to create evaluation frameworks that actually predict real-world performance

AI Systems

projects

techniques

Manager-Style Hyper-Specific Prompts

Chain of Thought with Self-Consistency

Tree of Thoughts

ReAct (Reasoning + Acting)

Few-Shot with Dynamic Examples

Model-Graded Evaluation

Multi-Model Orchestration

Semantic Code Analysis

development stack

Claude Code

Cursor

DSPy Framework

Multi-Model Systems

philosophy

related writing

AI Systems

projects

techniques

Manager-Style Hyper-Specific Prompts

Chain of Thought with Self-Consistency

Tree of Thoughts

ReAct (Reasoning + Acting)

Few-Shot with Dynamic Examples

Model-Graded Evaluation

Multi-Model Orchestration

Semantic Code Analysis

development stack

Claude Code

Cursor

DSPy Framework

Multi-Model Systems

philosophy

related writing