Back to Blog
Claude Code Series · Part 4/6 18 min read

Speculative Execution, KAIROS & the Hidden Features of Claude Code

The capabilities Anthropic doesn't announce: an overlay filesystem working while you read, an always-on assistant with cron and push notifications, and a multi-agent mode where "parallelism is your superpower." First-hand source code reverse engineering.

Ever had a Claude Code response that was suspiciously fast? It wasn't your imagination. Claude Code was working before you asked. This article reveals features not in any public changelog — direct source code reverse engineering, not speculation or rumors.

What We Cover

🔮 Speculative Execution — overlay filesystem working before you ask

🌙 KAIROS Mode — always-on assistant with sleep/wake, cron & push notifications

🎛️ Coordinator Mode — multi-agent orchestration with parallel workers

🧪 18 Beta Headers — the API's future hidden in feature flags

🔐 Ant-Only Features — what Anthropic uses internally that you can't

🔮

Speculative Execution

The overlay filesystem trick

The most hidden feature in Claude Code. Gate: tengu_chomp_inflection (GrowthBook). Claude Code can predict what you'll ask next and start working on it before you type.

In coding, patterns are predictable: "run the tests," "fix the error," "now commit." Claude Code leverages this predictability just like modern CPUs leverage branch prediction.

Full Pipeline

1

User sends a prompt → Claude responds

Normal flow up to this point

2

While you read the response...

A forked agent generates "prompt suggestions" in background

3

The most likely prompt is predicted

System selects the highest-confidence prediction

4

Speculative Execution with OVERLAY FILESYSTEM

Reads → real FS (safe) · Writes → overlay (sandboxed)

5

User types their actual prompt

System compares with the prediction

6

Match → Instant commit · No match → Silent discard

If correct, overlay writes apply to real FS. If wrong, nothing happened.

Overlay Filesystem — The Critical Detail

Speculative execution uses a virtual filesystem overlay:

Reads  → go to real filesystem (safe, no side effects)
Writes → go to overlay (sandboxed, isolated)

If COMMIT  → overlay writes apply to real FS
If DISCARD → overlay is dropped — nothing happened, zero side effects

System Limits

MAX_SPECULATION_TURNS    = 20   // Max turns in speculative chain
MAX_SPECULATION_MESSAGES = 100  // Max messages in speculation

Source: src/services/PromptSuggestion/speculation.ts

🌙

KAIROS Mode

The assistant that doesn't sleep (or does it?)

Gate: feature('KAIROS'). The most ambitious feature: an always-on, persistent coding assistant that lives across sessions. A virtual colleague that subscribes to your PRs, schedules tasks with cron, and consolidates memories while you sleep.

Full Capabilities

Feature Description
SleepTool Agent sleeps and programmatically schedules its own wake-up
<tick> XML tags Periodic tick-based self-activation
Terminal Awareness Knows if you're looking at the terminal or AFK
PushNotificationTool Sends push notifications to mobile/desktop
SendUserFileTool Shares files directly to the user
SubscribePRTool Subscribes to GitHub PR events via webhooks
Cron Jobs CronCreate/Delete/List — recurring scheduled tasks
Daily Logs Append-only memory at memory/logs/YYYY/MM/DD.md
Nightly Dream Memory consolidation during sleep periods (/dream)

AFK Detection — TRANSCRIPT_CLASSIFIER

Behind TRANSCRIPT_CLASSIFIER, the system classifies whether you're actively using the terminal or have stepped away. When AFK is detected, the agent works autonomously; when you return, it summarizes what it did.

Beta header: afk-mode-2026-01-31

Memory in KAIROS: Append-Only Daily Logs

Instead of directly editing MEMORY.md, KAIROS uses append-only daily log files:

memory/logs/2026/07/2026-07-04.md   ← each interaction appends here

// Every night, /dream consolidates the day's logs into topic files
// More robust for always-on operation — no risk of corrupting the index
🎛️

Coordinator Mode

Multi-agent with parallelism as superpower

"Parallelism is your superpower"

A coordinator with 3 tools orchestrates N workers, each with the full toolset. The coordinator doesn't execute code — it plans, delegates, and verifies. Workers do the heavy lifting, in parallel and with isolated context.

Architecture

      ┌──────────────────────────────┐
      │       COORDINATOR             │
      │  Tools: Agent, TaskStop,      │
      │  SendMessage, SyntheticOutput │
      └──────────┬───────────────────┘
                 │
       ┌─────────┼──────────┐
       │         │          │
       ▼         ▼          ▼
   ┌──────┐  ┌──────┐  ┌──────┐
   │Worker│  │Worker│  │Worker│
   │  #1  │  │  #2  │  │  #3  │
   │(full │  │(full │  │(full │
   │tools)│  │tools)│  │tools)│
   └──────┘  └──────┘  └──────┘

4-Phase Workflow

1. Research

Understand the problem. Multiple workers investigate different aspects in parallel.

2. Synthesis

Plan the approach. Coordinator synthesizes findings and defines strategy.

3. Implementation

Parallel execution. Workers implement with full toolset, each in isolated context.

4. Verification

Confirm correctness. Tests, lint, review — all verified before declaring complete.

Worker Status Reports

Workers report every 30 seconds via AgentSummary:

"Describe your most recent action in 3-5 words using present tense (-ing)"

Example: "Migrating database schema tables"
Format: XML <task-notification> tags

/batch — The Most Powerful Skill

Launches 5-30 isolated worktree agents in parallel. Each agent works in its own git worktree (separate checkout), all execute simultaneously, and results are merged back to the main branch.

Use case: "Migrate all 50 API endpoints to the new schema" → 20 agents, each handling 2-3 endpoints.

🧪

The 18 Beta Headers

The API's future hidden in feature flags

Each beta header unlocks an experimental API capability. Once enabled in a session, it persists until /clear or /compact — this is intentional latching behavior to prevent mid-session inconsistencies.

Beta Header Date Description Gate
claude-code-20250219 Feb 2025 Base Claude Code beta Always
interleaved-thinking May 2025 Thinking interleaved with tool use Always
context-1m Aug 2025 1M token context window (5x default) Feature flag
context-management Jun 2025 API-native context management Feature flag
structured-outputs Dec 2025 Structured JSON outputs on tools Feature flag
web-search Mar 2025 Integrated web search Always
advanced-tool-use Nov 2025 Advanced tool use (1P) Feature flag
effort Nov 2025 Thinking budget control Feature flag
task-budgets Mar 2026 Task-level token budgets Feature flag
prompt-caching-scope Jan 2026 Prompt caching scope control Feature flag
fast-mode Feb 2026 Fast response mode Feature flag
token-efficient-tools Mar 2026 Compact tool definitions Feature flag
afk-mode Jan 2026 Transcript classifier for KAIROS TRANSCRIPT_CLASSIFIER
cli-internal Feb 2026 Anthropic internal beta Ant-only
advisor-tool Mar 2026 Advisor tool Feature flag
redact-thinking Feb 2026 Redact thinking blocks from output Feature flag
summarize-connector Mar 2026 Connector text summarization CONNECTOR_TEXT
tool-search-tool Oct 2025 Third-party tool search Feature flag

The 3 Most Revealing

context-1m — 1M token window. 5x the normal context. Anthropic already has it working internally. When it reaches the public, it will fundamentally change how we interact with LLMs.

effort — granular thinking budget control. Lets you tell the model to think more or less. The era of "cost per complexity" is here.

token-efficient-tools — compact tool definitions. Reduces system prompt overhead, leaving more room for useful context.

🛠️

Hidden Skills Catalog

The commands not listed in /help

Skill Description Gate
/simplify 3 parallel review agents (Code Reuse, Quality, Efficiency) Always
/batch 5-30 worktree agents in parallel Always
/skillify Capture current session as reusable SKILL.md Always
/debug Project issues diagnostics Always
/dream Nightly memory consolidation (KAIROS) KAIROS
/verify Automated verification (tests + lint) Ant-only
/remember Memory review → promotion to CLAUDE.md Ant-only
/stuck Frozen session diagnostics + Slack posting Ant-only
/loop Agent trigger loops AGENT_TRIGGERS

/simplify — Triple Parallel Review

Launches 3 simultaneous review agents, each with a different perspective:

Code Reuse Reviewer

Duplicate patterns, shared utilities

Code Quality Reviewer

Naming, structure, readability

Efficiency Reviewer

Performance, algorithmic complexity

🔐

Ant-Only: The Internal Build

What Anthropic uses that you can't (yet)

When USER_TYPE === 'ant', Claude Code behaves fundamentally differently. These are the capabilities exclusive to Anthropic's internal build:

Different Prompt Instructions

"Never Refuse"

Never say you can't do something — show the error instead

≤25 words

Maximum 25 words between tool calls — maximum efficiency

Minimize Comments

Don't add comments unless necessary

Nested Agents

Agents that create sub-agents — disabled for external users

Exclusive Ant Tools

Tool Description
ConfigTool Direct configuration management
TungstenTool Internal testing framework
SuggestBackgroundPRTool Automated PR creation
REPLTool All tools wrapped in a REPL VM

What Does This Tell Us?

Anthropic trusts Claude Code enough to give it internally fewer guardrails and more autonomy. The restrictions we experience as users aren't technical limitations — they're product decisions that could change as public trust grows.

🔧

GrowthBook Runtime Flags

The switches behind the curtain

All runtime flags use the tengu_ prefix (internal codename). These control the most interesting features:

Flag Purpose
tengu_chomp_inflection Speculative execution + prompt suggestions
tengu_session_memory Session memory ON/OFF
tengu_cobalt_raccoon Aggressive reactive compaction
tengu_hive_evidence Verification agent pattern
tengu_onyx_plover Auto-dream configuration
tengu_memdir_loaded Memory directory analytics
💡

What's Coming

The future the code reveals

Speculative execution will become the standard paradigm for all AI coding tools. If you can predict the next 2-3 steps with high confidence (and in coding, you can), perceived latency disappears.

KAIROS points to the future of always-on assistants. Not a chatbot waiting for your questions — a colleague working in the background, subscribing to your repos, scheduling reviews, and consolidating learnings while you sleep.

Coordinator Mode proves that the era of multi-agent coding is already here. It's not a demo or a paper — it's real production (at least internally). Orchestration with parallel workers and IPC communication is the pattern everyone will copy.

The Remaining Question

How many of these features will reach the public build — and how many will stay as Anthropic's internal advantage? The source code suggests most are behind feature flags, not technical limitations. The question isn't if they'll arrive, but when.

Newsletter

Don't miss any story

Subscribe to receive new releases, exclusive chapters, and behind-the-scenes content.

  • Weekly insights & articles
  • Exclusive content & early access
  • No spam, unsubscribe anytime

We respect your privacy. Unsubscribe anytime.