now.
updated september 2025
this page is the running log of where my time goes. if you're curious about working together—or just want to compare notes—you can see what's actually on the bench right now.
home base
san francisco ↔ new york
currently
shipping evaluation guardrails at evalops
availability
one advisory slot open for q4 2025
what i'm shipping
the work on my calendar this month. most of it blends research, evaluation, and operations.
embedding with product teams to turn flaky evaluation suites into reliable guardrails.
shipping a toolkit that watches staged rollouts and flags behavior regressions in real time.
maintaining an open-source rust scanner that catches auth and data-leak risks in AI pipelines.
what i'm reading
the ideas shaping how i build right now. i keep a longer list on the reading page.
The Dream Machine
re-reading it to stay grounded in why we build tools for other people, not just ourselves.
Recent Anthropic red-teaming papers
pulling ideas for automated behavioral probes we can adapt for production workloads.
Engineering Management for the Rest of Us
using the hiring and feedback chapters with founders I advise.
what i'm chewing on
- how to make eval tooling feel like CI/CD: fast, trustworthy, and boring in the best way.
- ways to close the loop between human analysts and automated probes so each one sharpens the other.
- whether DSPy-style self-improving evaluators can trigger rollbacks before humans notice regressions.
want the longer arc? i post monthly logs on the blog and share in-progress prototypes on github.
inspired by derek sivers' now page movement. last edit: september 2025.