Agents and evals

Agent systems, code-review loops, evaluation infrastructure, and the work required to make AI reliable outside a demo.

The AI work I keep returning to: orchestration, feedback loops, measurable behavior, and where autonomy breaks down.

Start with operating the agents, then move into review, evals, and local context.

Read first

Everything else