Opus 4.6 is live — SWE-Bench 80.9%, Agent Teams, 1M context

Claude
Sonnet 5

Codenamed "Fennec". Reportedly one full generation ahead of Gemini's "Snow Bunny". Optimized for agentic workflows, advanced coding capabilities, and next-generation reasoning efficiency.

Model ID claude-sonnet-5@20260203
Context Window 1,000,000 Tokens via API
SWE-Bench Verified > 80.9% Outscoring current coding models
Cost Efficiency 50% Cheaper than Opus 4.5

Agentic "Dev Team" Mode

Leaks indicate a new capability to spawn specialized sub-agents (backend, QA, researcher) that work in parallel from the terminal. Agents run autonomously in the background - you give a brief, they build the full feature like human teammates.

  • Autonomous feature building from briefs
  • Specialized sub-agents working in parallel
  • Terminal-based collaboration interface

TPU Acceleration

Allegedly trained and optimized on Google TPUs, enabling higher throughput and lower latency. Retains the 1M token context window but runs significantly faster than previous generations.

Input Pricing $3.00 / MTok
Output Pricing $15.00 / MTok

Benchmark Comparison

Based on leaked internal evaluation data

Metric Sonnet 5 (Rumored) Opus 4.5 GPT-5 Codex
SWE-Bench Verified > 80.9% 77.2% 74.9%
OSWorld (Computer Use) > 85% 61.4% --
Context Window 1M Tokens 1M (API) 128k - 1M
Inference Speed High (TPU Native) Moderate High

* All benchmarks are based on unverified leaks and subject to change upon official release.

Leak Timeline

Feb 2026 (Projected)

Imminent Release

Current speculation points to a February 3rd launch window, aimed at capturing the Q1 narrative before competitor releases.

Jan 2026

Vertex AI Confirmation

Model identifier claude-sonnet-5@20260203 spotted in Google Vertex AI error logs. A 404 error on the specific Sonnet 5 ID suggests the model already exists in Google's infrastructure, awaiting activation.

Nov 2025

Initial Rumors

Early reports from forecasting platforms and industry insiders suggested a Q1-Q2 2026 window. Some early claims of a 10M context window have since been tempered to a more realistic 1M high-speed context.

Released

Claude Opus 4.6

Anthropic's latest flagship model. 1M token context (beta), Agent Teams, self-correction on long tasks, and 2.5x faster inference already in testing. The community verdict: a generational leap.

Official & Community Benchmarks

Compiled from Anthropic's announcement and third-party reproductions

Benchmark Opus 4.6 GPT-5.2 Notes
SWE-Bench Verified 80.9% -- Industry leading
Terminal-Bench 2.0 65.4% -- Highest agentic coding score
BigLaw Bench 90.2% -- Legal reasoning
OSWorld 66.3% -- Computer use
GDPval-AA +144 Elo baseline Knowledge work value
Long Context Retrieval (1M) 76% -- Sonnet 4.5 was 18.5%

Source: @sairahulxGTM benchmark compilation — the most-shared independent benchmark post on X.

Benchmark Visuals

Opus 4.6 Official Benchmark Chart
Official Anthropic benchmark comparison
Opus 4.6 Performance Chart
Long context & reasoning performance
Terminal-Bench 2.0 Scores
Terminal-Bench 2.0: Opus 4.6 vs GPT-5.3 Codex
427x Kernel Speedup
427x kernel optimization speedup experiment
Opus 4.6 vs GPT-5.2 Comparison
Overall performance comparison (eesel.ai)
GDPval-AA Elo Comparison
GDPval-AA: +144 Elo over GPT-5.2 (LinkedIn)

Community Highlights

Key Capabilities

1M
Token Context (Beta)
76% needle-in-haystack retrieval vs Sonnet 4.5's 18.5%
Agent Teams
Parallel Sub-Agents
Self-correcting on long tasks, plans more carefully
2.5x
Faster Inference
Already in internal testing, now on Claude Code & API