Window: April 21–28, 2026 (first run — no prior newsletter, using last 7 days).
Claude Code — quality issues and new releases (April 23–28)
Anthropic published a postmortem on six weeks of quality regressions and shipped four versions (v2.1.118–2.1.121) with significant hook, MCP, and skill improvements.
Three Anthropic product changes silently degraded Claude Code for six weeks
TL;DR: Three separate changes compounded between March 4 and April 20 to make Claude Code noticeably worse: (1) On March 4, Anthropic dropped the default reasoning effort from high to medium to cut latency — the wrong tradeoff. (2) On March 26, a caching update shipped with a bug that wiped session thinking every turn instead of just once, making the model seem forgetful and draining usage quotas faster than expected. (3) On April 16, a system-prompt instruction capping responses at 25 words between tool calls was added; combined with other prompt changes it hurt coding quality and was reverted April 20. All three issues are fixed as of v2.1.116. Anthropic reset usage limits for all affected subscribers and now defaults Opus 4.7 to xhigh reasoning effort and all other models to high.
What to do: Upgrade to v2.1.116 or later if you're still on an older pin; if your sessions felt forgetful or unusually brief between early March and late April, this was the cause — not the model getting worse.
Why trust it: First-party Anthropic engineering postmortem with specific dates, version numbers, and descriptions of each change; no independent measurement of impact, but the disclosing team shipped the changes.
Skeptic check: Anthropic does not quantify what fraction of users were affected or by how much, so severity is self-reported.
PostToolUse hooks can now rewrite any tool's output, not just MCP tools
Source: Claude Code changelog, v2.1.121 (April 28, 2026)
TL;DR: The hookSpecificOutput.updatedToolOutput field in PostToolUse hooks previously only replaced output from MCP tools. As of v2.1.121, it works for all tools — Bash, Write, Read, Glob, and other built-ins — so a hook can intercept and replace whatever a tool returned before Claude sees it.
What to do: Use PostToolUse hooks to inject guardrails, sanitize output, or append context after any built-in tool call, not just MCP calls.
Why trust it: Official Anthropic changelog entry.
Skeptic check: —
Hooks can now call MCP tools directly without a shell intermediary
Source: Claude Code changelog, v2.1.118 (April 23, 2026)
TL;DR: Hook configs now support type: "mcp_tool", which calls a named tool on an already-connected MCP server directly — no shell subprocess, no extra process spawn. Previously, triggering an MCP tool from a hook required a shell script that itself called the MCP server.
What to do: Replace PostToolUse shell-script workarounds that proxy into MCP servers with type: "mcp_tool" entries for simpler, faster hook configs.
Why trust it: Official Anthropic changelog entry.
Skeptic check: —
alwaysLoad MCP config option bypasses tool-search deferral per server
Source: Claude Code changelog, v2.1.121 (April 28, 2026)
TL;DR: Claude Code's tool-search system defers loading MCP tool definitions until Claude asks for them, reducing upfront context use. The new alwaysLoad: true per-server config option opts a specific MCP server out of this deferral, so all its tools are immediately in context without a discovery step.
What to do: Add alwaysLoad: true to MCP servers whose tools your agent needs on every turn; leave others deferred to save context tokens.
Why trust it: Official Anthropic changelog entry.
Skeptic check: The ~85% context-token reduction claimed for tool-search deferral is Anthropic's own figure, not independently verified.
claude ultrareview runs a multi-agent code review from CI without an interactive session
TL;DR: The new claude ultrareview [target] subcommand packages the repo, ships it to a cloud sandbox, runs a fleet of reviewer agents in parallel (each from a different angle: application logic, edge cases, security, performance), then exits with code 0 on success or 1 on failure — no interactive session required. Without arguments it reviews the diff from the current branch against the default branch. Each run typically costs $5–20 depending on repo size; Pro and Max subscribers get three free runs expiring May 5, 2026.
What to do: Wire claude ultrareview into a CI step for pre-merge checks and use your three free runs before May 5 to calibrate cost-per-review for your codebase.
Why trust it: Official Anthropic changelog and docs.
Skeptic check: Review quality is not benchmarked against human code review or other automated tools in any published comparison.
/config settings now persist to settings.json across restarts
Source: Claude Code changelog, v2.1.119 (April 23, 2026)
TL;DR: Theme, editor mode, verbose, and other /config settings previously reset each session. As of v2.1.119, they write to ~/.claude/settings.json and survive restarts.
What to do: Run /config once to set your preferred defaults; they'll stick.
Why trust it: Official Anthropic changelog entry.
Skeptic check: —
Skills can now branch on the current reasoning effort level via ${CLAUDE_EFFORT}
Source: Claude Code changelog, v2.1.120 (April 28, 2026)
TL;DR: Skill frontmatter can now reference ${CLAUDE_EFFORT}, which expands to the current effort setting (low / medium / high / xhigh), so a skill's instructions can adapt — for example, skipping expensive searches when effort is low.
What to do: Update skills that run expensive analysis steps to gate those steps on ${CLAUDE_EFFORT} being above a threshold.
Why trust it: Official Anthropic changelog entry.
Skeptic check: —
PostToolUse hooks now receive tool execution time as duration_ms
Source: Claude Code changelog, v2.1.119 (April 23, 2026)
TL;DR:PostToolUse and PostToolUseFailure hook inputs now include a duration_ms field with actual wall-clock tool execution time, excluding permission prompts and pre-tool hook overhead.
What to do: Log or alert on duration_ms in PostToolUse to identify which tools are bottlenecking your agent sessions.
Why trust it: Official Anthropic changelog entry.
Skeptic check: —
--from-pr now accepts GitLab, Bitbucket, and GitHub Enterprise URLs
Source: Claude Code changelog, v2.1.119 (April 23, 2026)
TL;DR:--from-pr previously only worked with github.com URLs. As of v2.1.119, it accepts GitLab merge-request, Bitbucket pull-request, and GitHub Enterprise URLs.
What to do: Teams on GitLab, Bitbucket, or GitHub Enterprise can now use --from-pr to load pull-request context directly into Claude Code.
Why trust it: Official Anthropic changelog entry.
Skeptic check: —
Model quality
Opus 4.7 gains coding and vision quality but shows a sharp drop on one long-context retrieval benchmark — Anthropic disputes the benchmark's real-world validity.
Opus 4.7 scores 46 points lower than Opus 4.6 on MRCR long-context retrieval (disputed by Anthropic)
Source: Vellum AI benchmarks analysis (third-party); Anthropic response widely reported
TL;DR: On MRCR v2 at 1M tokens, Opus 4.7 scores 32.2% versus 78.3% for Opus 4.6 — a 46-point drop. At 256k tokens with an 8-needle retrieval task the drop is 91.9% → 59.2%. Anthropic's position: MRCR stacks distractors in unrealistic ways and the model improves on their preferred GraphWalks benchmark. Separately, Opus 4.7 ships a tokenizer change that increases effective API cost. Coding and vision quality go up; long-document retrieval accuracy goes down on at least one independent measurement.
What to do: If your agent finds specific facts inside very long documents, test Opus 4.7 against your actual data before upgrading from 4.6; for coding-heavy or vision tasks, the upgrade is straightforward at the same list price.
Why trust it: Numbers are independently reported across multiple third-party benchmark analyses, and Anthropic acknowledges the MRCR drop while disputing its relevance; no published independent replication of GraphWalks results.
Skeptic check: Anthropic is the party most motivated to contest MRCR's validity here; wait for independent long-context benchmark results on GraphWalks before concluding the regression does or does not matter in practice.
Meta
Candidate sources
Vellum AI blog — surfaced while searching for Opus 4.7 benchmark analysis. Vellum appears to run their own independent multi-model benchmark suite and publish the results on release day for new models. Track record: unknown beyond this run, needs observation.
Prompt friction
Five seed-list sources — anthropic.com, simonwillison.net, latent.space, cursor.com, news.ycombinator.com — return 403 on direct fetch, so all coverage from them relies on web-search snippets. Items that don't surface prominently in search results but would pass the inclusion bar are probably being missed. Consider adding a fallback to RSS/Atom feeds (where available) as primary access paths for these sources.
Sources used today: Anthropic engineering blog, Claude Code GitHub releases (anthropics/claude-code), Claude Code official changelog (code.claude.com), Vellum AI, Latent Space (latent.space), Simon Willison (simonwillison.net — via search snippets), releasebot.io, and various search aggregators for discovery.
Skipped: 3 funding/partnership posts; 4 leaderboard or benchmark-win announcements without methodology content; 2 opinion takes without evidence. Additionally outside the April 21–28 window: Cursor 3.0 (Apr 2), Anthropic OpenClaw ban (Apr 4), Cursor Bugbot self-improvement (Apr 8), Cursor CLI improvements (Apr 14), OpenAI Codex "for almost everything" (Apr 16), Claude Design launch (Apr 17), AGENTS.md efficiency research (Jan 2026), Nathan Lambert "use multiple models" (Jan 2026), Latent Space "Extreme Harness Engineering" (Apr 7), Latent Space "Externalization in LLM Agents" arxiv paper (submitted Apr 9).
Coverage gaps: Direct fetch blocked (403) for anthropic.com, simonwillison.net, latent.space, cursor.com, news.ycombinator.com, and arxiv.org paper pages. RSS/Atom feeds were not attempted on this run — a likely improvement for future runs.
latent.space/p/harness-eng — 403 — wanted to confirm token volume figures and specific workflow recommendations from the OpenAI Frontier team piece on extreme harness engineering.
cursor.com/changelog/3-0 — 403 — wanted to verify exact Cursor 3.0 features for parallel-agent and worktree support (article pre-dates the window but was surfaced in discovery).