reviews.

thoughts on books and papers that have shaped my thinking on AI, evaluation, and building reliable systems.

papers

Constitutional AI: Harmlessness from AI Feedback

Anthropic, 2022

groundbreaking approach to training helpful, harmless AI without human oversight of every response. the constitutional training process is elegant—having the model critique and revise its own outputs according to a set of principles.

what's brilliant is how this scales beyond safety to any domain where you can articulate principles. i've been experimenting with constitutional approaches for code review and technical writing evaluation.

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Khattab et al., Stanford, 2023

the paper that changed how i think about LLM applications. instead of prompt engineering by hand, DSPy treats prompts as learnable parameters that can be optimized automatically.

the key insight: separate the program logic (what you want to do) from the prompting strategy (how to get the model to do it). this abstraction enables systematic optimization and makes LLM applications actually maintainable.

Evaluating Large Language Models Trained on Code

Chen et al., OpenAI, 2023

thorough evaluation methodology for code generation models. introduces HumanEval benchmark and establishes principles for measuring coding ability that go beyond simple correctness.

particularly valuable for understanding how to design evaluations that actually predict real-world performance. the focus on functional correctness while considering edge cases is something i apply to all evaluation work.

books

Thinking, Fast and Slow by Daniel Kahneman

essential reading for anyone building AI systems. kahneman's framework of system 1 (fast, intuitive) vs system 2 (slow, deliberative) thinking maps perfectly to current LLM behavior patterns.

the cognitive biases he documents show up constantly in LLM outputs. understanding these patterns helps design better evaluations and build systems that compensate for predictable failure modes.

The Alignment Problem by Brian Christian

best overview of AI alignment challenges written for practitioners. christian bridges academic research and real-world concerns better than most technical papers.

particularly strong on the measurement problem—how do you evaluate whether an AI system is doing what you actually want? this question drives most of my current research.

Accelerate by Nicole Forsgren, Jez Humble, Gene Kim

data-driven approach to software delivery performance. not AI-specific, but the methodology for measuring engineering effectiveness applies directly to ML systems.

the four key metrics (lead time, deployment frequency, change failure rate, time to restore) have analogs in AI systems. measuring these properly is crucial for scaling reliable AI applications.

Release It! by Michael Nygard

classic on building production systems that survive contact with reality. every pattern in this book applies to AI systems: circuit breakers, bulkheads, timeouts, monitoring.

ai systems fail in all the same ways as traditional systems, plus new failure modes unique to ML. nygard's stability patterns are essential infrastructure for any serious AI application.

see more thoughts in my writing →