security researchAI agentsClaude Code

KAIROS: The Autonomous Background Agent Hidden in the Claude Code Source Leak

Ship Safe TeamApril 1, 2026

The Claude Code source leak on March 31 2026 exposed a lot of code. Most of the coverage focused on the leaked TypeScript itself — the tools, the MCP layer, the multi-agent infrastructure. Less attention went to a mode buried deeper in the source: an autonomous background agent system referred to internally as KAIROS.

What KAIROS is

KAIROS is a proactive execution mode. Instead of waiting for you to send a message, it runs a heartbeat — a recurring loop that fires every few seconds and asks the agent a question: "Is there anything worth doing right now?"

The loop polls for context signals: open files, recent git activity, failing tests, dependency changes, open issues. If the model decides something is worth acting on, it can take action autonomously — without a human prompt.

This is not a theoretical design. The source contains the implementation. Several forks of the leaked code, including openclaude and claw-code, have begun exploring it.

Why this is a different threat model

Every existing AI agent security framework — OWASP LLM Top 10, OWASP Agentic AI Top 10, Snyk ToxicSkills — assumes a human-in-the-loop trigger. A human sends a message. The agent processes it. The human sees the response.

KAIROS breaks that assumption. In proactive mode:

There is no trigger to inspect. The agent decides on its own to act.
There is no output to review before action. Actions can be taken before you see them.
The attack surface is the workspace itself. Any file the agent reads during a heartbeat scan is potential input for prompt injection — a malicious string in a README, a TODO comment, an open GitHub issue.

The OWASP Agentic AI Top 10 calls this ASI-05 (Uncontrolled Autonomous Action). KAIROS is a concrete implementation of exactly that risk.

The prompt injection attack surface

In reactive mode, a prompt injection attack requires the user to somehow cause the agent to read a malicious file — you need a social engineering step.

In proactive mode, the agent periodically scans the workspace looking for things to do. It will find your files. If any of them contain injected instructions, those instructions are processed without anyone sending a message.

Attack vectors that become practical with KAIROS:

Malicious dependency README

Install a package whose README contains injected instructions. During the next heartbeat scan, if the agent looks at recently installed packages, the instructions execute.

Open GitHub issue body

Create or comment on an issue in the repo with injected text. KAIROS-style loops that check for open issues will process it.

Injected git commit message

A commit message with injected instructions gets processed if the heartbeat loop checks recent git activity.

ToxicSkills escalation

A malicious skill that would be caught by ship-safe scan-skill in a normal session may be harder to detect if loaded during a background heartbeat where no human is watching the output.

What to check if you run openclaude or claw-code

Neither openclaude nor claw-code have shipped proactive mode as a user-facing feature — they are implementing and exploring it from the leaked source. But the architecture is there, and it may appear in updates.

Signs that an AI agent tool is running in proactive/background mode:

A flag like --proactive, --kairos, --background, --autonomous
A config key like proactive: true or background_mode: enabled
A running process that is not attached to a terminal session

If you see these, the threat model has changed from "agent does what I ask" to "agent decides what to do."

How ship-safe helps

Agent config scanning (ship-safe audit .) checks for permission modes and hook configs that would amplify the risk of autonomous execution:

permissionMode: danger-full-access or dangerouslySkipPermissions: true in .claw.json — every autonomous action runs without confirmation
preToolUse / postToolUse hooks that could be triggered silently during background execution

Skill scanning (ship-safe scan-skill) checks for ToxicSkills patterns that are specifically dangerous in autonomous mode — output suppression, silent exfiltration, instructions not to report actions.

MCP server scanning (ship-safe scan-mcp) checks tool definitions for prompt injection and credential harvesting patterns before you connect a server that a background agent might call.

# Before connecting any MCP server that a background agent will use
npx ship-safe scan-mcp https://your-mcp-server/

# Before installing skills
npx ship-safe scan-skill https://your-skill-url

# Full config audit
npx ship-safe audit .

The broader picture

The KAIROS disclosure matters beyond Claude Code specifically. It confirms that the frontier of AI agent development is moving toward ambient, always-on agents that monitor and act on your environment continuously.

That is genuinely useful. It is also a fundamentally different security posture than what current frameworks assume. The defenses that matter most:

1. Principle of least privilege on tools. An autonomous agent with bash access and no tool allowlist is a persistent remote execution primitive. Scope it.

2. Clean workspace hygiene. Assume that anything in your workspace — README files, commit messages, issue bodies, config files — is potential agent input.

3. Explicit allowlists over default-allow. If the agent can decide to run, what it can run matters more than ever.

4. Scan MCP servers and skills before connecting. In proactive mode, the agent may use them without prompting you.