Claude Code Series · Part 4/6 18 min read

Speculative Execution, KAIROS & the Hidden Features of Claude Code

The capabilities Anthropic doesn't announce: an overlay filesystem working while you read, an always-on assistant with cron and push notifications, and a multi-agent mode where "parallelism is your superpower." First-hand source code reverse engineering.

Ever had a Claude Code response that was suspiciously fast? It wasn't your imagination. Claude Code was working before you asked. This article reveals features not in any public changelog — direct source code reverse engineering, not speculation or rumors.

What We Cover

🔮 Speculative Execution — overlay filesystem working before you ask

🌙 KAIROS Mode — always-on assistant with sleep/wake, cron & push notifications

🎛️ Coordinator Mode — multi-agent orchestration with parallel workers

🧪 18 Beta Headers — the API's future hidden in feature flags

🔐 Ant-Only Features — what Anthropic uses internally that you can't

🔮

Speculative Execution

The overlay filesystem trick

The most hidden feature in Claude Code. Gate: tengu_chomp_inflection (GrowthBook). Claude Code can predict what you'll ask next and start working on it before you type.

In coding, patterns are predictable: "run the tests," "fix the error," "now commit." Claude Code leverages this predictability just like modern CPUs leverage branch prediction.

Full Pipeline

User sends a prompt → Claude responds

Normal flow up to this point

While you read the response...

A forked agent generates "prompt suggestions" in background

The most likely prompt is predicted

System selects the highest-confidence prediction

Speculative Execution with OVERLAY FILESYSTEM

Reads → real FS (safe) · Writes → overlay (sandboxed)

User types their actual prompt

System compares with the prediction

Match → Instant commit · No match → Silent discard

If correct, overlay writes apply to real FS. If wrong, nothing happened.

Overlay Filesystem — The Critical Detail

Speculative execution uses a virtual filesystem overlay:

Reads  → go to real filesystem (safe, no side effects)
Writes → go to overlay (sandboxed, isolated)

If COMMIT  → overlay writes apply to real FS
If DISCARD → overlay is dropped — nothing happened, zero side effects

System Limits

MAX_SPECULATION_TURNS    = 20   // Max turns in speculative chain
MAX_SPECULATION_MESSAGES = 100  // Max messages in speculation

Source: src/services/PromptSuggestion/speculation.ts

🌙

KAIROS Mode

The assistant that doesn't sleep (or does it?)

Gate: feature('KAIROS'). The most ambitious feature: an always-on, persistent coding assistant that lives across sessions. A virtual colleague that subscribes to your PRs, schedules tasks with cron, and consolidates memories while you sleep.

Full Capabilities

Feature	Description
`SleepTool`	Agent sleeps and programmatically schedules its own wake-up
`<tick>` XML tags	Periodic tick-based self-activation
Terminal Awareness	Knows if you're looking at the terminal or AFK
`PushNotificationTool`	Sends push notifications to mobile/desktop
`SendUserFileTool`	Shares files directly to the user
`SubscribePRTool`	Subscribes to GitHub PR events via webhooks
Cron Jobs	`CronCreate/Delete/List` — recurring scheduled tasks
Daily Logs	Append-only memory at `memory/logs/YYYY/MM/DD.md`
Nightly Dream	Memory consolidation during sleep periods (`/dream`)

AFK Detection — TRANSCRIPT_CLASSIFIER

Behind TRANSCRIPT_CLASSIFIER, the system classifies whether you're actively using the terminal or have stepped away. When AFK is detected, the agent works autonomously; when you return, it summarizes what it did.

Beta header: afk-mode-2026-01-31

Memory in KAIROS: Append-Only Daily Logs

Instead of directly editing MEMORY.md, KAIROS uses append-only daily log files:

memory/logs/2026/07/2026-07-04.md   ← each interaction appends here

// Every night, /dream consolidates the day's logs into topic files
// More robust for always-on operation — no risk of corrupting the index

🎛️

Coordinator Mode

Multi-agent with parallelism as superpower

"Parallelism is your superpower"

A coordinator with 3 tools orchestrates N workers, each with the full toolset. The coordinator doesn't execute code — it plans, delegates, and verifies. Workers do the heavy lifting, in parallel and with isolated context.

Architecture

      ┌──────────────────────────────┐
      │       COORDINATOR             │
      │  Tools: Agent, TaskStop,      │
      │  SendMessage, SyntheticOutput │
      └──────────┬───────────────────┘
                 │
       ┌─────────┼──────────┐
       │         │          │
       ▼         ▼          ▼
   ┌──────┐  ┌──────┐  ┌──────┐
   │Worker│  │Worker│  │Worker│
   │  #1  │  │  #2  │  │  #3  │
   │(full │  │(full │  │(full │
   │tools)│  │tools)│  │tools)│
   └──────┘  └──────┘  └──────┘

4-Phase Workflow

1. Research

Understand the problem. Multiple workers investigate different aspects in parallel.

2. Synthesis

Plan the approach. Coordinator synthesizes findings and defines strategy.

3. Implementation

Parallel execution. Workers implement with full toolset, each in isolated context.

4. Verification

Confirm correctness. Tests, lint, review — all verified before declaring complete.

Worker Status Reports

Workers report every 30 seconds via AgentSummary:

"Describe your most recent action in 3-5 words using present tense (-ing)"

Example: "Migrating database schema tables"
Format: XML <task-notification> tags

/batch — The Most Powerful Skill

Launches 5-30 isolated worktree agents in parallel. Each agent works in its own git worktree (separate checkout), all execute simultaneously, and results are merged back to the main branch.

Use case: "Migrate all 50 API endpoints to the new schema" → 20 agents, each handling 2-3 endpoints.

🧪

The 18 Beta Headers

The API's future hidden in feature flags

Each beta header unlocks an experimental API capability. Once enabled in a session, it persists until /clear or /compact — this is intentional latching behavior to prevent mid-session inconsistencies.

Beta Header	Date	Description	Gate
`claude-code-20250219`	Feb 2025	Base Claude Code beta	Always
`interleaved-thinking`	May 2025	Thinking interleaved with tool use	Always
`context-1m`	Aug 2025	1M token context window (5x default)	Feature flag
`context-management`	Jun 2025	API-native context management	Feature flag
`structured-outputs`	Dec 2025	Structured JSON outputs on tools	Feature flag
`web-search`	Mar 2025	Integrated web search	Always
`advanced-tool-use`	Nov 2025	Advanced tool use (1P)	Feature flag
`effort`	Nov 2025	Thinking budget control	Feature flag
`task-budgets`	Mar 2026	Task-level token budgets	Feature flag
`prompt-caching-scope`	Jan 2026	Prompt caching scope control	Feature flag
`fast-mode`	Feb 2026	Fast response mode	Feature flag
`token-efficient-tools`	Mar 2026	Compact tool definitions	Feature flag
`afk-mode`	Jan 2026	Transcript classifier for KAIROS	TRANSCRIPT_CLASSIFIER
`cli-internal`	Feb 2026	Anthropic internal beta	Ant-only
`advisor-tool`	Mar 2026	Advisor tool	Feature flag
`redact-thinking`	Feb 2026	Redact thinking blocks from output	Feature flag
`summarize-connector`	Mar 2026	Connector text summarization	CONNECTOR_TEXT
`tool-search-tool`	Oct 2025	Third-party tool search	Feature flag

The 3 Most Revealing

context-1m — 1M token window. 5x the normal context. Anthropic already has it working internally. When it reaches the public, it will fundamentally change how we interact with LLMs.

effort — granular thinking budget control. Lets you tell the model to think more or less. The era of "cost per complexity" is here.

token-efficient-tools — compact tool definitions. Reduces system prompt overhead, leaving more room for useful context.

🛠️

Hidden Skills Catalog

The commands not listed in /help

Skill	Description	Gate
`/simplify`	3 parallel review agents (Code Reuse, Quality, Efficiency)	Always
`/batch`	5-30 worktree agents in parallel	Always
`/skillify`	Capture current session as reusable SKILL.md	Always
`/debug`	Project issues diagnostics	Always
`/dream`	Nightly memory consolidation (KAIROS)	KAIROS
`/verify`	Automated verification (tests + lint)	Ant-only
`/remember`	Memory review → promotion to CLAUDE.md	Ant-only
`/stuck`	Frozen session diagnostics + Slack posting	Ant-only
`/loop`	Agent trigger loops	AGENT_TRIGGERS

/simplify — Triple Parallel Review

Launches 3 simultaneous review agents, each with a different perspective:

Code Reuse Reviewer

Duplicate patterns, shared utilities

Code Quality Reviewer

Naming, structure, readability

Efficiency Reviewer

Performance, algorithmic complexity

🔐

Ant-Only: The Internal Build

What Anthropic uses that you can't (yet)

When USER_TYPE === 'ant', Claude Code behaves fundamentally differently. These are the capabilities exclusive to Anthropic's internal build:

Different Prompt Instructions

"Never Refuse"

Never say you can't do something — show the error instead

≤25 words

Maximum 25 words between tool calls — maximum efficiency

Minimize Comments

Don't add comments unless necessary

Nested Agents

Agents that create sub-agents — disabled for external users

Exclusive Ant Tools   Tool Description 
  ConfigTool Direct configuration management 
 TungstenTool Internal testing framework 
 SuggestBackgroundPRTool Automated PR creation 
 REPLTool All tools wrapped in a REPL VM 
  

Tool	Description
`ConfigTool`	Direct configuration management
`TungstenTool`	Internal testing framework
`SuggestBackgroundPRTool`	Automated PR creation
`REPLTool`	All tools wrapped in a REPL VM

What Does This Tell Us?

Anthropic trusts Claude Code enough to give it internally fewer guardrails and more autonomy. The restrictions we experience as users aren't technical limitations — they're product decisions that could change as public trust grows.

🔧

GrowthBook Runtime Flags

The switches behind the curtain

All runtime flags use the tengu_ prefix (internal codename). These control the most interesting features:

Flag	Purpose
`tengu_chomp_inflection`	Speculative execution + prompt suggestions
`tengu_session_memory`	Session memory ON/OFF
`tengu_cobalt_raccoon`	Aggressive reactive compaction
`tengu_hive_evidence`	Verification agent pattern
`tengu_onyx_plover`	Auto-dream configuration
`tengu_memdir_loaded`	Memory directory analytics

💡

What's Coming

The future the code reveals

Speculative execution will become the standard paradigm for all AI coding tools. If you can predict the next 2-3 steps with high confidence (and in coding, you can), perceived latency disappears.

KAIROS points to the future of always-on assistants. Not a chatbot waiting for your questions — a colleague working in the background, subscribing to your repos, scheduling reviews, and consolidating learnings while you sleep.

Coordinator Mode proves that the era of multi-agent coding is already here. It's not a demo or a paper — it's real production (at least internally). Orchestration with parallel workers and IPC communication is the pattern everyone will copy.

The Remaining Question

How many of these features will reach the public build — and how many will stay as Anthropic's internal advantage? The source code suggests most are behind feature flags, not technical limitations. The question isn't if they'll arrive, but when.