Speculative Execution, KAIROS & the Hidden Features of Claude Code
The capabilities Anthropic doesn't announce: an overlay filesystem working while you read, an always-on assistant with cron and push notifications, and a multi-agent mode where "parallelism is your superpower." First-hand source code reverse engineering.
Ever had a Claude Code response that was suspiciously fast? It wasn't your imagination. Claude Code was working before you asked. This article reveals features not in any public changelog — direct source code reverse engineering, not speculation or rumors.
What We Cover
🔮 Speculative Execution — overlay filesystem working before you ask
🌙 KAIROS Mode — always-on assistant with sleep/wake, cron & push notifications
🎛️ Coordinator Mode — multi-agent orchestration with parallel workers
🧪 18 Beta Headers — the API's future hidden in feature flags
🔐 Ant-Only Features — what Anthropic uses internally that you can't
Speculative Execution
The overlay filesystem trick
The most hidden feature in Claude Code. Gate: tengu_chomp_inflection
(GrowthBook). Claude Code can predict what you'll ask next
and start working on it before you type.
In coding, patterns are predictable: "run the tests," "fix the error," "now commit." Claude Code leverages this predictability just like modern CPUs leverage branch prediction.
Full Pipeline
User sends a prompt → Claude responds
Normal flow up to this point
While you read the response...
A forked agent generates "prompt suggestions" in background
The most likely prompt is predicted
System selects the highest-confidence prediction
Speculative Execution with OVERLAY FILESYSTEM
Reads → real FS (safe) · Writes → overlay (sandboxed)
User types their actual prompt
System compares with the prediction
Match → Instant commit · No match → Silent discard
If correct, overlay writes apply to real FS. If wrong, nothing happened.
Overlay Filesystem — The Critical Detail
Speculative execution uses a virtual filesystem overlay:
Reads → go to real filesystem (safe, no side effects) Writes → go to overlay (sandboxed, isolated) If COMMIT → overlay writes apply to real FS If DISCARD → overlay is dropped — nothing happened, zero side effects
System Limits
MAX_SPECULATION_TURNS = 20 // Max turns in speculative chain MAX_SPECULATION_MESSAGES = 100 // Max messages in speculation
Source: src/services/PromptSuggestion/speculation.ts
KAIROS Mode
The assistant that doesn't sleep (or does it?)
Gate: feature('KAIROS').
The most ambitious feature: an always-on, persistent coding assistant
that lives across sessions. A virtual colleague that subscribes to your PRs, schedules tasks with cron,
and consolidates memories while you sleep.
Full Capabilities
| Feature | Description |
|---|---|
SleepTool | Agent sleeps and programmatically schedules its own wake-up |
<tick> XML tags | Periodic tick-based self-activation |
| Terminal Awareness | Knows if you're looking at the terminal or AFK |
PushNotificationTool | Sends push notifications to mobile/desktop |
SendUserFileTool | Shares files directly to the user |
SubscribePRTool | Subscribes to GitHub PR events via webhooks |
| Cron Jobs | CronCreate/Delete/List — recurring scheduled tasks |
| Daily Logs | Append-only memory at memory/logs/YYYY/MM/DD.md |
| Nightly Dream | Memory consolidation during sleep periods (/dream) |
AFK Detection — TRANSCRIPT_CLASSIFIER
Behind TRANSCRIPT_CLASSIFIER, the system classifies whether you're
actively using the terminal or have stepped away. When AFK is detected, the agent works autonomously;
when you return, it summarizes what it did.
Beta header: afk-mode-2026-01-31
Memory in KAIROS: Append-Only Daily Logs
Instead of directly editing MEMORY.md, KAIROS uses
append-only daily log files:
memory/logs/2026/07/2026-07-04.md ← each interaction appends here // Every night, /dream consolidates the day's logs into topic files // More robust for always-on operation — no risk of corrupting the index
Coordinator Mode
Multi-agent with parallelism as superpower
"Parallelism is your superpower"
A coordinator with 3 tools orchestrates N workers, each with the full toolset. The coordinator doesn't execute code — it plans, delegates, and verifies. Workers do the heavy lifting, in parallel and with isolated context.
Architecture
┌──────────────────────────────┐
│ COORDINATOR │
│ Tools: Agent, TaskStop, │
│ SendMessage, SyntheticOutput │
└──────────┬───────────────────┘
│
┌─────────┼──────────┐
│ │ │
▼ ▼ ▼
┌──────┐ ┌──────┐ ┌──────┐
│Worker│ │Worker│ │Worker│
│ #1 │ │ #2 │ │ #3 │
│(full │ │(full │ │(full │
│tools)│ │tools)│ │tools)│
└──────┘ └──────┘ └──────┘ 4-Phase Workflow
1. Research
Understand the problem. Multiple workers investigate different aspects in parallel.
2. Synthesis
Plan the approach. Coordinator synthesizes findings and defines strategy.
3. Implementation
Parallel execution. Workers implement with full toolset, each in isolated context.
4. Verification
Confirm correctness. Tests, lint, review — all verified before declaring complete.
Worker Status Reports
Workers report every 30 seconds via AgentSummary:
"Describe your most recent action in 3-5 words using present tense (-ing)" Example: "Migrating database schema tables" Format: XML <task-notification> tags
/batch — The Most Powerful Skill
Launches 5-30 isolated worktree agents in parallel. Each agent works in its own git worktree (separate checkout), all execute simultaneously, and results are merged back to the main branch.
Use case: "Migrate all 50 API endpoints to the new schema" → 20 agents, each handling 2-3 endpoints.
The 18 Beta Headers
The API's future hidden in feature flags
Each beta header unlocks an experimental API capability.
Once enabled in a session, it persists until /clear or
/compact — this is intentional latching behavior
to prevent mid-session inconsistencies.
| Beta Header | Date | Description | Gate |
|---|---|---|---|
claude-code-20250219 | Feb 2025 | Base Claude Code beta | Always |
interleaved-thinking | May 2025 | Thinking interleaved with tool use | Always |
context-1m | Aug 2025 | 1M token context window (5x default) | Feature flag |
context-management | Jun 2025 | API-native context management | Feature flag |
structured-outputs | Dec 2025 | Structured JSON outputs on tools | Feature flag |
web-search | Mar 2025 | Integrated web search | Always |
advanced-tool-use | Nov 2025 | Advanced tool use (1P) | Feature flag |
effort | Nov 2025 | Thinking budget control | Feature flag |
task-budgets | Mar 2026 | Task-level token budgets | Feature flag |
prompt-caching-scope | Jan 2026 | Prompt caching scope control | Feature flag |
fast-mode | Feb 2026 | Fast response mode | Feature flag |
token-efficient-tools | Mar 2026 | Compact tool definitions | Feature flag |
afk-mode | Jan 2026 | Transcript classifier for KAIROS | TRANSCRIPT_CLASSIFIER |
cli-internal | Feb 2026 | Anthropic internal beta | Ant-only |
advisor-tool | Mar 2026 | Advisor tool | Feature flag |
redact-thinking | Feb 2026 | Redact thinking blocks from output | Feature flag |
summarize-connector | Mar 2026 | Connector text summarization | CONNECTOR_TEXT |
tool-search-tool | Oct 2025 | Third-party tool search | Feature flag |
The 3 Most Revealing
context-1m — 1M token window. 5x the normal context. Anthropic already has it working internally. When it reaches the public, it will fundamentally change how we interact with LLMs.
effort — granular thinking budget control. Lets you tell the model to think more or less. The era of "cost per complexity" is here.
token-efficient-tools — compact tool definitions. Reduces system prompt overhead, leaving more room for useful context.
Hidden Skills Catalog
The commands not listed in /help
| Skill | Description | Gate |
|---|---|---|
/simplify | 3 parallel review agents (Code Reuse, Quality, Efficiency) | Always |
/batch | 5-30 worktree agents in parallel | Always |
/skillify | Capture current session as reusable SKILL.md | Always |
/debug | Project issues diagnostics | Always |
/dream | Nightly memory consolidation (KAIROS) | KAIROS |
/verify | Automated verification (tests + lint) | Ant-only |
/remember | Memory review → promotion to CLAUDE.md | Ant-only |
/stuck | Frozen session diagnostics + Slack posting | Ant-only |
/loop | Agent trigger loops | AGENT_TRIGGERS |
/simplify — Triple Parallel Review
Launches 3 simultaneous review agents, each with a different perspective:
Code Reuse Reviewer
Duplicate patterns, shared utilities
Code Quality Reviewer
Naming, structure, readability
Efficiency Reviewer
Performance, algorithmic complexity
Ant-Only: The Internal Build
What Anthropic uses that you can't (yet)
When USER_TYPE === 'ant', Claude Code behaves
fundamentally differently. These are the capabilities exclusive to Anthropic's internal build:
Different Prompt Instructions
"Never Refuse"
Never say you can't do something — show the error instead
≤25 words
Maximum 25 words between tool calls — maximum efficiency
Minimize Comments
Don't add comments unless necessary
Nested Agents
Agents that create sub-agents — disabled for external users
Exclusive Ant Tools
| Tool | Description |
|---|---|
ConfigTool | Direct configuration management |
TungstenTool | Internal testing framework |
SuggestBackgroundPRTool | Automated PR creation |
REPLTool | All tools wrapped in a REPL VM |
What Does This Tell Us?
Anthropic trusts Claude Code enough to give it internally fewer guardrails and more autonomy. The restrictions we experience as users aren't technical limitations — they're product decisions that could change as public trust grows.
GrowthBook Runtime Flags
The switches behind the curtain
All runtime flags use the tengu_ prefix (internal codename).
These control the most interesting features:
| Flag | Purpose |
|---|---|
tengu_chomp_inflection | Speculative execution + prompt suggestions |
tengu_session_memory | Session memory ON/OFF |
tengu_cobalt_raccoon | Aggressive reactive compaction |
tengu_hive_evidence | Verification agent pattern |
tengu_onyx_plover | Auto-dream configuration |
tengu_memdir_loaded | Memory directory analytics |
What's Coming
The future the code reveals
Speculative execution will become the standard paradigm for all AI coding tools. If you can predict the next 2-3 steps with high confidence (and in coding, you can), perceived latency disappears.
KAIROS points to the future of always-on assistants. Not a chatbot waiting for your questions — a colleague working in the background, subscribing to your repos, scheduling reviews, and consolidating learnings while you sleep.
Coordinator Mode proves that the era of multi-agent coding is already here. It's not a demo or a paper — it's real production (at least internally). Orchestration with parallel workers and IPC communication is the pattern everyone will copy.
The Remaining Question
How many of these features will reach the public build — and how many will stay as Anthropic's internal advantage? The source code suggests most are behind feature flags, not technical limitations. The question isn't if they'll arrive, but when.