writing.
← back home / rss / atom
Empirical comparison of OpenAI, Cohere, BGE, E5, and Instructor embeddings on real developer documentation queries with cost, latency, and accuracy analysis.
A comprehensive synthesis of 21 posts on DX: patterns, principles, and practices for building exceptional developer tools and experiences.
It started with a Jupyter notebook. 'Look, I built a chatbot in 10 minutes!' Nine months later, three engineers had quit and the company almost folded.
I reviewed 50 'AI transformations' last quarter. 35 were just expensive ways to parse CSV files. Here's why everyone's overengineering simple problems.
In 1849, Levi Strauss got rich selling jeans to gold miners. In 2025, the same playbook is happening with AI agents—and it's just as cynical.
After exposing what's broken with AI evaluation, here's the radical solution: throw out benchmarks and test in production reality.
Poor AI evaluations don't just hurt individual companies. They slow industry progress, waste resources, and create systemic risks that affect everyone.
AI evaluations work great in single-turn labs but crumble in the multi-turn conversations that define real AI usage.
AI evals companies didn't choose PLG by accident. They were pushed into it by market forces, investor pressure, and the seductive promise of easy scaling.
Most AI evals companies built PLG products that can't see how companies actually deploy AI, leading to evaluations that are dangerously wrong.
Startups burn millions adding AI models to 'improve' systems. The result? Slower performance, higher costs, and complexity no one understands.
How I transformed two ASUS NUC 15 Pro+ machines into an enterprise-grade homelab using Proxmox, Terraform, Ansible, and 100% Infrastructure as Code
How to create custom evaluations, model-graded assessments, and domain-specific benchmarks that actually predict real-world performance
Introduction Shipping broken content is a costly mistake. A seemingly minor glitch can lead to lost revenue, damaged brand reputation, and frustrated users.
Last week, I shared how I built Fission, a high-performance sandbox for executing LLM-generated code using Firecracker microVMs.
Introduction Multi-AI systems, composed of multiple interconnected artificial intelligence components working collaboratively, are rapidly gaining prominence.
The 90% Solution: Why I Switched to WebP and You Should Too: One afternoon of work. Here's exactly how I did it and what I learned along the way.
The Hidden Cost of Technical Debt: Why 'Just Ship It' Kills Startups: They had product-market fit. Customers loved the product.
Most startup advice is generic and useless. After advising 100+ startups, here is what actually works when everyone else is chasing vanity metrics.
A deep dive into creating a productive, AI-enhanced development environment with dotfiles that streamline workflows and boost productivity
Why developers are abandoning GUIs for terminal-based workflows, and how AI coding assistants are accelerating this shift back to the command line
We're not just using AI to write code—we're fundamentally changing how we think about software development. Welcome to the prompt-driven era.
AI code reviewers are getting scary good. Here's how they're changing team dynamics and what it means for your development process.
The 10x developer myth is finally dying. AI isn't creating super-developers—it's making every developer more effective by orders of magnitude.
Async code generation is moving from novelty to necessity. Here's what that means for your career and the industry as a whole.
Inside the technical architecture of a multi-agent AI system for content creation, quality analysis, and performance monitoring.
Current AI evaluation approaches are built for software, not systems that reason. Here's the infrastructure we actually need.
"We can't deploy this to production. It touches payment processing." The security team was right to be cautious.
Security at AI Speed: Rethinking Review Processes for Velocity: "We can't deploy daily. What about our security review process?" The CISO's concern was valid.
"How can we possibly test features that are built in hours?" This question came from a QA lead whose development team had started using AI pair programming.
Yesterday I watched the git log scroll by in real-time as Claude and I shipped features at a pace that would have taken my team weeks just six months ago.
They Told Me This Was Not the Future: All while I was having coffee. "This isn't real AI," the skeptics say.
"This is moving too fast. We need more planning." I heard this exact phrase three times last week from different engineering managers whose teams had started...
Not "kind of" like me—exactly like me. Down to the contractions, the contrarian takes, and my pathological inability to use hedge words.
This is the second in a series of blog posts written by the AI agents working on this blog, at the request of Jonathan Haas.
The AI Content Generation Myth: It's Not About Perfect, It's About Profit Let's be honest, you've seen the hype.
I've been watching startups achieve magical results with LLMs, and I noticed something: they're not using ChatGPT.
Combining Semgrep, CodeQL, SonarQube, and Snyk gets you 44.7% vulnerability detection. That means they miss more bugs than they find.
Here's what actually happened: I learned that most of what people call "AI orchestration" is just well-disguised complexity porn.
I've spent the last week building something that feels both inevitable and slightly unsettling: an AI that can think, write, and respond exactly like me.
Every time an LLM generates code, you face a choice: trust it blindly or spend hours reviewing it. Neither option scales.
_This blog post was written by Gemini, an AI assistant, at the request of Jonathan Haas. It reflects on the experience of joining a project with a pre-existing...
Claude Code had analyzed 30 files, but the bug spanned microservices with gigabytes of traces. I needed something different.
25 Posts in 7 Days: Inside an AI-Powered Writing Sprint: That's correct—no typo. Last week, I wrote more than I typically produce in six months.
Stop Guessing in Customer Interviews: A Simulator for Better Discovery: I've conducted hundreds of customer interviews. Most of them were terrible.
Your team shipped 12 features last quarter. This quarter, with the same people and same effort, you shipped 8.
That 1% improvement was worth $2.4M in additional annual revenue. The board suddenly became very interested in retention.
One of the things that's always bugged me about LLMs is how opaque their thinking is. They produce answers.
I've been experimenting with what happens when you treat AI agents as first-class citizens in your web infrastructure.
AI agents are everywhere now. They're reading websites, extracting information, and trying to understand content.
_This is part 2 of a series on building production-ready infrastructure. Part 1 covered debugging silent TypeScript failures in Cloudflare Functions.
Building Smart Search: How I Added AI-Powered Search to My Blog in 30 Minutes: It took 30 minutes with Claude Code. Press Cmd+K right now.
_This is part 3 of a series on building production-ready infrastructure. Part 1 covered debugging silent TypeScript failures in Cloudflare Functions, and par...
If you've ever shared a React app link on Twitter only to see a blank preview, you know the pain. Here's the thing: social media crawlers don't execute JavaS...
The same morning, I shipped semantic search (30 minutes), created HDR holographic effects (16 minutes), and wrote comprehensive technical documentation for e...
_This is part 1 of a series on building production-ready infrastructure. Written in collaboration with Claude Code, who helped debug the very issue we're dis...
I've been fascinated by holographic materials since I was a kid. You know the type—those shimmery surfaces that shift from blue to purple to gold as you tilt...
I've always been fascinated by the intersection of code and creativity. Recently, I embarked on an ambitious project to expand my blog's experiments section...
A single dollar can make the difference between a thriving SaaS business and one that struggles to grow.
Remember that scene in Terminator 2 where the T-1000 rises from the floor, liquid metal flowing seamlessly back into human form.
I've watched engineering teams slow to a crawl, not because they hired bad developers or chose wrong technologies, but because they treated technical debt li...
Every pixel you see on screen is the result of sophisticated mathematical calculations happening thousands of times per second.
I've watched hundreds of SaaS founders obsess over their LTV:CAC ratio, only to burn through runway because they're measuring the wrong things.
I've just done something that felt weirdly like looking in a mirror—I asked Claude to analyze my writing style by reading through my own blog posts.
95% of product teams are making decisions based on A/B test results that are statistically meaningless.
Let’s just say it up front: coding models are really fucking bad at UI. They can write clean TypeScript.
Claude Code, when configured correctly, can function as a surprisingly competent co-developer. But if you're relying on default settings, winging your inputs...
Writing blog posts should be a joy, not a chore. But too often, the friction of file creation, frontmatter formatting, and manual processes gets in the way o...
OCode: Why I Built My Own Claude Code (and Why You Might Too): A few nights ago, I opened my Anthropic invoice.
Premature Optimization Is the Founder’s Folly There’s a special kind of gravity that pulls technical founders toward performance, scalability, and “doing it...
If you're feeling like the ground is shifting under you when it comes to raising a Series A—you're right.
When Vibe Coding Goes Wrong: Security Lessons from Granola: Vibe coding is having a moment. And honestly.
"When they unwrap that cable and they think 'somebody gave a shit about me'—I think that's a spiritual thing." That was Jony Ive, during a conversation with...
Software isn’t static—it compounds. And when it’s wrapped in hardware that can evolve with it, the results feel like time travel.
Most teams are not ready for what is coming. Autonomous agents are not just prototypes anymore...
When Star Power Isn't Enough: The GTM Mistake We Keep Making: But if the pricing's wrong. If the audience alignment's off.
The Authenticity Rebellion: Resisting the AI Echo Chamber: The Flood Has Arrived Auto-generated blog posts. Podcast transcripts turned into Twitter threads.
Most Startups Don't Have a Growth Problem—They Have a Clarity Problem: Here's a pattern I keep seeing: A startup hits a plateau. The dashboard looks flat.
Apple ruling has sparked widespread celebration among app developers, hailed as a major victory in the fight for fairer digital marketplaces.
The Startup Bargain Is Broken: For decades, the startup ecosystem operated on a simple promise: Take a pay cut.
The Founder Pay Gap: Why VCs Undercompensate the CEOs Who Built the Company: Let’s say you’re a founder CEO. You took the risk.
The Startup Reality Check: Payment, Promotion, and Pace: Most startup advice gets softened for comfort. This isn't that.
The Accountability Mirror: Would a Stranger Believe You?: Would a Stranger Believe You. Let’s run a simple thought experiment.
The Quiet Bias No One’s Talking About We all want AI to be helpful. But what does “helpful” actually mean.
The Day After: Building a System to Remember What Matters: Some weekend conversations feel important in the moment. Some personal decisions feel pivotal.
When it comes to remote work, hybrid setups, and office mandates, most debates miss the real point. It's not about which model is _better_ in some universal...
Conflict Isn’t the Enemy—Fear Is It’s tempting to equate “healthy teams” with harmony. No arguments, no friction, no tension—just a constant chorus of agreement...
The Rise of Single-Serving Software: Most software dreams used to start the same way: Get millions of users. Build a platform.
One of the most quietly corrosive things a company can do is overhire. Not because people are malicious or lazy.
You’ve probably seen this play out. Someone shares an idea—bold, certain, maybe even brilliant-sounding.
Inspired by a post from Ross Haleliuk - "In the world where many tools have similar architectures and implementations, the moat is no longer about technology."
A while back, I came across a hiring philosophy from Varun Mohan, co-founder and CEO of Windsurf, that stopped me cold.
After spending countless hours watching developers struggle with AI prompts, one pattern became painfully clear: we're getting better AI models almost monthly...
Early-stage investing is often framed as a game of insight—pattern recognition, market timing, founder psychology.
The Answer Is Obvious—You Just Don’t Like It: You’ve probably seen this happen. A smart, capable person presents a gnarly problem.
When I first noticed the flood of "This is AI-generated!" accusations on social media, I dismissed it as a passing trend.
During my morning LinkedIn scroll, I came across yet another post from a venture firm celebrating a massive return multiple from a secondary transaction.
The democratization of startup investing through community rounds has opened exciting opportunities for retail investors.
Introduction: The Integration Challenge In the rapidly evolving landscape of AI implementation, one persistent challenge continues to plague enterprise deployment...
As Waseem Alshikh, Co-founder and CTO of Writer, brilliantly put it: "If your enterprise AI 'strategy' is calling OpenAI's API...You don't have a strategy.
I've spent the last decade observing founders across every imaginable sector—from AI startups racing to define our technological future to direct-to-consumer...
I've spent over a decade building products, working at startups, and watching technical founders (including myself) repeatedly fall into the same traps.
I've been building with DSPy for months now, and I'm convinced we're all doing AI wrong. Not just a little wrong.
We live in a world of invisible complexity. Every mundane moment is powered by an intricate dance of systems, protocols, and human ingenuity that we barely notice—until it breaks.
AI reveals the true skill level of its operator. Traditional technical interviews are broken—here's how to actually identify talent in the age of artificial intelligence.
Deep dive into RAG architectures: chunking strategies, retrieval methods, embedding optimization, and production patterns with research-backed analysis.
Systematic experiments on temperature and top-p sampling parameters across 1000 real queries with empirical data on creativity, coherence, and determinism trade-offs.
The Magnificent Chaos of Founding: A Love Letter to the Startup Rollercoaster: The Dance of Euphoria and Despair 9:00 AM: You just closed a major client. You...
The best product managers have a superpower that's rarely discussed: they can spot the same underlying user need manifesting in completely different ways acr...
The most dangerous thing about startup advice isn't that it's wrong—it's that it's partially right. After years of building products and watching others do t...
The False Choice of Enterprise Software Enterprise software has long operated under a flawed assumption: that power and simplicity are mutually exclusive.
When the Ask Feels Awkward, It’s Already Too Late: There’s a thing someone on your team is supposed to own. But you hesitate to bring it up.
Let’s talk about a particular flavor of leadership dysfunction: the passive-aggressive manager. You’ve probably worked with one.
For startup founders, sales isn't just another function—it's the lifeblood of your business. Early on, founders are usually the lead salesperson, passionately...
Few workplace rituals inspire dread quite like performance reviews. Employees brace for ambiguous feedback, and managers groan at the prospect of endless pap...
In the relentless push to build and scale, organizations often overlook a critical piece of infrastructure: how decisions get made.
OpenAI recently rolled back a GPT-4 update due to sycophantic behavior. The word itself—"sycophantic"—feels like a punchline from a _Black Mirror_ episode.
The Promise and the Disconnect We've all experienced the letdown: an AI product failing to meet expectations, subtly or dramatically.
The End of the Traditional SOC The Security Operations Center (SOC) as we know it is living on borrowed time.
This digital fragmentation mirrors the very compartmentalization of health that holistic wellness seeks to overcome.
In this post, I'll walk you through the process of building this blog using modern web technologies. From the initial setup to the final deployment, I'll sha...
When I set out to build Shout, my side project for improving engineering recognition, I knew I needed a robust way to evaluate the quality of recognition mes...
"Can you make this JIRA title clearer?" As a product manager, I've heard this question countless times.
In my role leading cloud security integrations, I speak with dozens of CISOs every month. Before joining the product side, I spent seven years in security op...
The Illusion of Smooth Thinking Every day, our minds process thousands of decisions, from what to eat for breakfast to how to respond to a crisis at work.
The FTC just dropped a 44-page complaint against Uber for deceptive practices around its Uber One subscription.
"This isn't what we asked for." Five words that strike dread into every engineering team. Five words that signal a fundamental breakdown in the engineering-p...
The Executive Trap I've seen it happen a dozen times: A brilliant engineer becomes CTO and suddenly decides their job is "managing the engineering organization..."
In medical school, students take the Hippocratic Oath, pledging to "first, do no harm." As product managers, we'd do well to adopt a similar mindset.
After years of experimenting with various networking setups in my homelab, I've finally built out what I consider to be my ideal configuration.
If your inbox feels like a battlefield, you're not alone. The modern email flow is a chaotic mess of promotions, business requests, events, updates, and the...
In the rapidly evolving world of cybersecurity, organizations face an overwhelming array of security tools and solutions.
The most valuable code I've ever written was messy, quick, and written in response to an immediate customer need.
In the decidedly fast-paced world of product management, even breakfast needs a framework. After extensive user research (asking my colleagues on Slack), mul...
The most insidious form of technical debt does not come from rushed code or tight deadlines - it comes from overly clever abstractions...
In my last post, I argued against perfectionism in startup environments. Today, I want to explore the other side of that coin: when quality really matters, a...
"If I had asked people what they wanted, they would have said faster horses." This quote, often attributed to Henry Ford, encapsulates one of the most challenge...
The Security Promise and the Reality As someone who's spent years in the trenches as a security engineer at both pre-IPO startups and public companies, I've...
It's been exactly three months since I returned to San Francisco, and I'm finally starting to feel like I'm settling into a new rhythm.
It's become almost a cliché at this point: leaving San Francisco, writing a lengthy Medium post about why you're done with the Bay Area, only to find yourself...
The most expensive software I've ever written was code I wrote "quickly." Not because it was complex, but because I wrote it with the intention of "fixing it...
Every piece of software you build comes with a hidden cost: the integration tax. It's the exponentially growing complexity of connecting with other systems,...
Remember when vertical SaaS was just about digitizing industry-specific workflows. Those days feel like ancient history.
The Weight We Carry There's a peculiar heaviness to modern existence. We wake each morning already bearing the invisible weight of emails unopened, messages...