i build ai security guardrails.

i built EvalOps, the ai security and evaluation platform that turns testing into release contracts. we gate your ci pipeline so regressions, jailbreaks, and data leaks never touch production.

evalops couples policy-driven access control, continuous eval monitoring, and forensic audit trails so security, ml, and compliance leaders share the same source of truth.

the platform aligns with NIST AI-RMF, MITRE ATLAS, and OWASP guidance out of the box so boards see provable controls even while generative models evolve every sprint.

before this i shipped code at Snap, Carta, and DoorDash, then built ThreatKey. most days you can find me pairing with teams that want fewer surprises in production and ai programs the ciso can defend in front of the board.

how evalops hardens ai deployments

dynamic access controls

release contracts wire runtime access to eval verdicts so prompt injection payloads never inherit production trust.

continuous monitoring

every deploy carries live probes that benchmark jailbreak resistance, latency, and hallucination drift around the clock.

forensic audit trails

verdicts, transcripts, and rollout decisions stay immutable for regulators and boards measuring controls against policy.

recent writing

Grief in the Loop: When AI Won’t Let Us Let Go

Marcus built a memorial chatbot because staying close to loss felt safer than silence. The rest of us keep repeating the same prompt, hoping the ending changes.

#ai#ethics#product
I Tested 5 Embedding Models on 10K Developer Questions

Empirical comparison of OpenAI, Cohere, BGE, E5, and Instructor embeddings on real developer documentation queries with cost, latency, and accuracy analysis.

#ai#research#embeddings
The Complete Guide to Developer Experience

A comprehensive synthesis of 21 posts on DX: patterns, principles, and practices for building exceptional developer tools and experiences.

#developer-experience#engineering#product

see more

projects i'm proud of

EvalOps ai security brief

how release contracts, probes, and playbooks harden model-driven apps.

EvalOps lab

applied research shop pressure-testing evaluation guardrails with real teams.

security engineering series

field notes on hardening production systems before they fall apart.

cognitive dissonance detection

multi-agent probes that flag conflicting model behavior before users see it.

dspy 0-to-1 guide

hands-on playbook for shipping self-improving LLM apps without guesswork.