i build ai security guardrails.
i built EvalOps, the ai security and evaluation platform that turns testing into release contracts. we gate your ci pipeline so regressions, jailbreaks, and data leaks never touch production.
evalops couples policy-driven access control, continuous eval monitoring, and forensic audit trails so security, ml, and compliance leaders share the same source of truth.
the platform aligns with NIST AI-RMF, MITRE ATLAS, and OWASP guidance out of the box so boards see provable controls even while generative models evolve every sprint.
before this i shipped code at Snap, Carta, and DoorDash, then built ThreatKey. most days you can find me pairing with teams that want fewer surprises in production and ai programs the ciso can defend in front of the board.
how evalops hardens ai deployments
dynamic access controls
release contracts wire runtime access to eval verdicts so prompt injection payloads never inherit production trust.
continuous monitoring
every deploy carries live probes that benchmark jailbreak resistance, latency, and hallucination drift around the clock.
forensic audit trails
verdicts, transcripts, and rollout decisions stay immutable for regulators and boards measuring controls against policy.
recent writing
Marcus built a memorial chatbot because staying close to loss felt safer than silence. The rest of us keep repeating the same prompt, hoping the ending changes.
Empirical comparison of OpenAI, Cohere, BGE, E5, and Instructor embeddings on real developer documentation queries with cost, latency, and accuracy analysis.
A comprehensive synthesis of 21 posts on DX: patterns, principles, and practices for building exceptional developer tools and experiences.
projects i'm proud of
how release contracts, probes, and playbooks harden model-driven apps.
applied research shop pressure-testing evaluation guardrails with real teams.
field notes on hardening production systems before they fall apart.
multi-agent probes that flag conflicting model behavior before users see it.
hands-on playbook for shipping self-improving LLM apps without guesswork.