Jonathan Haaswritingnowusesabout
emailgithubx
Jonathan Haaswritingnowusesabout

Building Cerebro: Giving AI Agents an Organizational Brain

March 14, 2026·6 min read

Why AI agents need a world model before they can act safely, and how Cerebro provides pre-action enforcement, entity intelligence, and consequence...

#cerebro#security#open-source#agents

The hard problem with AI agents isn't making them do things. It's making them not do things.

Every agent framework lets you wire up tools and turn an LLM loose. Create a Jira ticket. Send a Slack message. Delete a customer record. The tools work fine. The problem is the agent has zero understanding of what it's about to break.

"Delete customer ACME Corp" is a valid tool call. It's also a catastrophic business decision if ACME is a $2M ARR account in Q4 with an active renewal negotiation. No tool schema captures that. No system prompt captures that. The context lives in your CRM, your finance system, your support tickets, and the heads of people who've been at the company long enough to know.

Cerebro exists to be that context layer. It's an open-source operations data platform that ingests data from cloud providers, SaaS tools, and business systems, builds an entity intelligence graph, and exposes pre-action enforcement gates so agents can ask "should I do this?" before they do it.

Pre-action enforcement: the core idea

The most important function in Cerebro is conceptually simple:

cerebro.check(entity_id, action_type) → allow | deny | require_approval

When an agent wants to act on an entity -- delete a customer, modify infrastructure, escalate a ticket -- it asks Cerebro first. Cerebro returns a decision with evidence: matched policies, propagation impact (affected revenue, SLA risks, attack paths), and remediation steps if the action is blocked.

This is different from approval gates that just ask "approve or deny?" with no context. Cerebro tells the human decision-maker why they should care. "This customer is responsible for $2M ARR, has an open support escalation, and their infrastructure admin hasn't rotated credentials in 90 days" is useful context when someone asks to modify their account.

The enforcement isn't just for business entities. The same pattern works for cloud security: an agent trying to modify an S3 bucket gets checked against the security policy engine before the change happens.

The entity intelligence graph

Pre-action enforcement requires a world model. Cerebro builds one.

The graph has 79 node kinds and 48 edge kinds -- AWS resources, GCP projects, Kubernetes clusters, but also people, teams, customers, deals, and support tickets. Provider-specific builders ingest data from 60+ AWS services, GCP, Azure, Kubernetes, Salesforce, HubSpot, Stripe, Zendesk, and others.

type Graph struct {
    nodes    map[string]*Node
    outEdges map[string][]*Edge
    inEdges  map[string][]*Edge
    indexByKind      map[NodeKind][]*Node
    indexByRisk      map[RiskLevel][]*Node
    crossAccountEdge []*Edge
    crownJewels      []*Node
}

The graph is built asynchronously and continuously updated. When Salesforce data refreshes, the customer nodes update. When AWS sync runs, the infrastructure nodes update. The graph reflects reality with a staleness indicator so consumers know how fresh the data is.

Three analyses make the graph useful for agents:

Attack path analysis. Multi-hop traversal finds breach chains. An internet-facing load balancer → overprivileged IAM role → S3 bucket with customer data. Each hop is a real edge with resource metadata. This matters for agents because it quantifies blast radius -- "if you change this, here's what's exposed."

Toxic combination detection. Cross-domain patterns invisible to single-purpose tools. Churn risk + elevated infrastructure access. Admin permissions + 90 days of inactivity + no MFA. These require connecting business signals with infrastructure signals, which requires the graph.

Privilege escalation paths. Identity resolution across AWS IAM, GCP service accounts, Okta, and GitHub, then tracing permission assumption chains. The user alice@company.com in Okta maps to an IAM role that can assume another role with S3 admin access. That chain matters when an agent is deciding whether to grant additional permissions.

The policy engine

Policies are Cedar-style JSON that lives in your repo:

{
  "id": "aws-s3-bucket-versioning-enabled",
  "effect": "forbid",
  "resource": "aws::s3::bucket",
  "conditions": ["versioning.status != 'Enabled'"],
  "severity": "medium",
  "remediation": "Enable versioning on the S3 bucket",
  "risk_categories": ["UNPROTECTED_DATA"],
  "mitre_attack": [{"tactic": "Collection", "technique": "T1530"}]
}

Why Cedar-style over a proprietary format? Because policies should be reviewable in PRs. When someone changes a security policy, that change should go through the same code review process as a code change. Clicking buttons in a vendor UI doesn't give you that.

The evaluator runs in a worker pool with content-hash caching -- SHA256 of asset properties combined with a fingerprint of active policy conditions. Unchanged assets skip re-evaluation. This makes continuous scanning practical: a full pass over 50,000 resources takes seconds after the first run.

Pre-built compliance mappings for SOC 2, CIS, PCI DSS, HIPAA, and NIST 800-53 are included. Each policy carries remediation instructions and MITRE ATT&CK mappings so findings are actionable, not just informational.

AI investigation agents

When the graph or policy engine surfaces something interesting, Cerebro has built-in investigation agents (Claude and GPT) that can dig deeper.

The Code-to-Cloud agent is the most useful: point it at a repository, it extracts resource references (ARNs, GCP resource IDs), then inspects each one from live cloud APIs. It returns structured findings showing how a hardcoded S3 bucket name in your application code maps to a live misconfiguration in your AWS account.

Every tool has a RequiresApproval flag. Read-only operations execute automatically. Anything that modifies state requires human sign-off. Cerebro investigates, it doesn't remediate without permission.

Agents have a memory system with relevance scoring and TTL-based expiration. They remember context within a session and prune stale entries automatically -- important when an investigation spans multiple cloud accounts and dozens of resources.

Why this architecture matters for the agent ecosystem

The bet behind Cerebro is that every agent framework will eventually need organizational context. LangChain, CrewAI, OpenAI Agents SDK, Anthropic's tool use -- they all let agents call tools. None of them know what the tools will break.

Cerebro is designed to sit between the agent and the tool. Any framework can call cerebro.check() before acting. The calibration flywheel -- more agents recording outcomes leads to better policy predictions -- becomes the product.

The same engine that catches a misconfigured S3 bucket also catches a premature customer deletion. The graph doesn't care whether the entity is infrastructure or business data. It cares about relationships, policies, and consequences.

Why Snowflake

The most controversial decision, and the one that matters most for the agent use case.

Snowflake means your organizational data is queryable with standard SQL. Agents can run ad-hoc queries against the full entity inventory. "Show me all IAM users with admin access who haven't logged in for 90 days" doesn't require a special API endpoint -- it's a SQL query the agent can compose and execute.

VARIANT columns handle the schema diversity problem. Cloud resources, CRM records, and support tickets have wildly different shapes. Snowflake's semi-structured data type handles all of them without migrations.

For local development, Cerebro falls back to SQLite with the same code paths.

Getting started

export CEREBRO_DB_PATH=.cerebro/cerebro.db
cerebro bootstrap
cerebro serve

cerebro sync --aws
cerebro query "SELECT * FROM findings WHERE severity = 'high'"

Start with one cloud provider and one compliance framework. The pre-action enforcement layer works once you have data in the graph -- connect your CRM and cloud provider, and agents can start asking "should I?" before they act.

Check out the repo. 326K lines of Go, open source, actively developed.

share

Continue reading

DiffScope: What Happens When You Give a Code Review Agent Real Context

Most AI review tools see a diff. DiffScope sees the diff, the callers, the type hierarchy, the team history, and knows when to shut up. Here is how.

Feature Flags for Security: Decouple Deployment from Risk

Security teams conflate deployment with activation. Feature flags split them apart, turning security from a gate into a dial.

Beyond Simple Prompts: Production-Grade LLM Techniques with DSPy

The best AI companies don't write prompts by hand. They generate them programmatically, test them systematically, and optimize them continuously.

emailgithubx