Building Kestrel: A Context-Aware AI Desktop Assistant in One Session

I wanted a personal AI assistant that knew what I was working on. Not one that required me to paste context into a chat window. One that could read my screen, understand what app I had open, and give contextually relevant answers.

LittleBird does this. It costs $20/month and sends everything to their servers. I wanted the same thing, local-first, with my own API keys.

So I built Kestrel.

What It Does

Kestrel is an Electron app that runs a native Swift helper binary. The Swift binary uses macOS Accessibility APIs to read the UI hierarchy of whatever app is in the foreground — window titles, text content, browser URLs. This context is injected into every AI conversation as a system prompt.

When you ask "what am I working on?", it already knows. It can see your terminal output, your browser tabs, your editor content. No copy-pasting required.

Beyond context-aware chat, it does:

Meeting recording — CoreAudio process taps capture system audio, AVAudioEngine captures your mic, both resampled to 16kHz WAV and sent to Whisper for transcription. AI-generated summaries and action items afterward.
Auto-detect meetings — Polls CoreAudio's kAudioProcessPropertyIsRunningInput to detect when Zoom, Meet, or Teams grab the microphone. Starts recording automatically, stops with a 30-second grace period.
Arena mode — Send the same prompt to 2-4 models simultaneously and compare responses side by side.
MCP integration — Claude Desktop-compatible config format. Connect any MCP server, tools get injected into the AI's system prompt and executed via a tool-calling loop.
Journal — AI-generated daily entries from context snapshots saved throughout the day.
Quick access overlay — Cmd+Shift+Space slides in a panel from the right edge.

Architecture

The interesting technical decisions:

Split agent architecture. The AI pipeline has two agents: an Executor that handles tool calls, context fetching, and API requests, and a Presenter that handles streaming to the UI, database persistence, and user-facing formatting. They have separate concerns and separate wide events for observability. The Executor never touches the renderer. The Presenter never calls an API.

Native Swift CLI over stdin/stdout. ContextKit is a Swift Package Manager project that communicates with Electron via JSON-RPC 2.0 over NDJSON. This is the same protocol as MCP and LSP. The binary runs with dispatchMain() keeping the main RunLoop alive for AudioQueue callbacks, while the JSON-RPC reader runs on a background thread.

CoreAudio process taps. For meeting recording, the app creates a CATapDescription and an aggregate audio device via AudioHardwareCreateAggregateDevice. This captures all system audio without requiring Screen Recording permission — only the purple audio indicator dot appears. Microphone capture uses AudioQueue on a dedicated thread with its own CFRunLoop.

CFRunLoopPerformBlock for AX calls. Accessibility API calls hang on background threads in packaged macOS apps. The fix: dispatch them to the main CFRunLoop via CFRunLoopPerformBlock, which works with RunLoop.main.run() unlike DispatchQueue.main.async (which requires dispatchMain() or NSApplication).

Wide events. Every operation emits a structured event — chat messages, tool calls, meeting detections, context captures. These go into a SQLite table and an in-memory ring buffer with real-time analytics (event counts, error rates, avg durations). There's an Event Log viewer in Settings that shows live events.

The Hardest Bugs

Preload .js vs .mjs. electron-vite compiles preload scripts to .mjs but the window configs referenced .js. This meant window.api was undefined in every renderer, and every IPC call silently failed. Chat didn't work. Context didn't work. Nothing worked. The fix was three characters across three files.

AVAudioConverter.reset(). The audio converter produced output on the first call, then returned empty buffers forever after. The inputBlock returns .endOfStream, which leaves the converter in a terminal state. One converter.reset() call before each conversion fixed everything. Meeting recordings went from 7KB (empty) to 158KB (5 seconds of real audio).

AX calls from background threads. Works in dev mode, hangs in the packaged app. Three approaches tried: DispatchQueue.main.async (deadlocks — GCD main queue not processed by CFRunLoop), direct background calls (hang in packaged binary), CFRunLoopPerformBlock (works). This took four iterations to get right.

Stack

Layer	Choice
Runtime	Electron 34, React 19, TypeScript
Build	electron-vite 5, Vite 6
UI	shadcn/ui, Tailwind CSS v4
State	MobX
Database	SQLite (better-sqlite3 + Drizzle ORM)
AI	OpenRouter (all models), OpenAI Whisper
Context	Native Swift CLI via JSON-RPC
Audio	CoreAudio taps, AudioQueue, AVAudioConverter
Tools	Model Context Protocol (MCP)

What I'd Do Differently

Start with the packaged app from day one. Half the bugs were dev-mode-vs-production differences that only surfaced late. The preload extension, the AX thread safety, the native module bundling — all of these would have been caught earlier.

Also: code signing. Every time you copy a new unsigned app to /Applications, macOS resets the Accessibility permission. During development I must have re-granted it fifty times. A proper Developer ID certificate would fix this permanently.

Try It

Kestrel is open source at github.com/haasonsaas/kestrel. It's a personal tool — no auth, no backend, no subscriptions. Bring your own OpenRouter API key.

git clone https://github.com/haasonsaas/kestrel
cd kestrel
npm install
npm run ContextKit:build
npm run dev

A kestrel is a falcon that hovers in place, scanning the ground below. That's what this app does — hovers over your work, watching what you're doing, ready to help when you ask.