List of recent good blog posts:
Anthropic
- www.anthropic.com/engineering/advanced-tool-use
- www.anthropic.com/engineering/building-c-compiler
- www.anthropic.com/engineering/building-effective-agents
- www.anthropic.com/engineering/claude-code-auto-mode
- www.anthropic.com/engineering/claude-code-sandboxing
- www.anthropic.com/engineering/claude-think-tool
- www.anthropic.com/engineering/code-execution-with-mcp
- www.anthropic.com/engineering/demystifying-evals-for-ai-agents
- www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
- www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
- www.anthropic.com/engineering/harness-design-long-running-apps
- www.anthropic.com/engineering/managed-agents
- www.anthropic.com/engineering/multi-agent-research-system
- www.anthropic.com/engineering/scaling-managed-agents
- www.anthropic.com/engineering/writing-tools-for-agents
- www.anthropic.com/news/mapping-mind-language-model
- www.anthropic.com/research/project-vend-1
- transformer-circuits.pub/2023/monosemantic-features
- transformer-circuits.pub/2024/crosscoders
- transformer-circuits.pub/2025/attribution-graphs/methods.html
Cursor
- cursor.com/blog/composer-2-technical-report
- cursor.com/blog/continually-improving-agent-harness
- cursor.com/blog/cursorbench
- cursor.com/blog/long-running-agents
- cursor.com/blog/multi-agent-kernels
- cursor.com/blog/scaling-agents
- cursor.com/blog/self-driving-codebases
- cursor.com/blog/third-era
Cognition / Devin
- cognition.ai/blog/agent-trace
- cognition.ai/blog/devin-annual-performance-review-2025
- cognition.ai/blog/dont-build-multi-agents
- cognition.ai/blog/multi-agents-working
- cognition.ai/blog/what-we-learned-building-cloud-agents
- devin.ai/agents101
OpenAI
- developers.openai.com/codex/app-server
- openai.com/index/harness-engineering
- openai.com/index/unlocking-the-codex-harness
- github.com/openai/symphony/blob/main/SPEC.md
HumanLayer
- www.humanlayer.dev/blog/advanced-context-engineering
- www.humanlayer.dev/blog/brief-history-of-ralph
- www.humanlayer.dev/blog/context-efficient-backpressure
- www.humanlayer.dev/blog/long-context-isnt-the-answer
- www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents
Amp / Ghuntley
- ampcode.com/news/handoff
- ampcode.com/notes/200k-tokens-is-plenty
- ampcode.com/notes/agents-for-the-agent
- ghuntley.com/cursed
- ghuntley.com/loop
- ghuntley.com/pressure
- ghuntley.com/ralph
arXiv papers
- arxiv.org/abs/2407.13692
- arxiv.org/abs/2409.13082
- arxiv.org/abs/2410.06209
- arxiv.org/abs/2412.06176
- arxiv.org/abs/2505.03989
- arxiv.org/abs/2505.20896
- arxiv.org/abs/2509.22908
- arxiv.org/abs/2510.01346
- arxiv.org/abs/2510.02917
- arxiv.org/abs/2510.03178
- arxiv.org/abs/2510.25015
Sandboxes / Infra
- blog.cloudflare.com/project-think
- e2b.dev/blog/firecracker-vs-qemu
- e2b.dev/blog/how-manus-uses-e2b-to-provide-agents-with-virtual-computers
- firecracker-microvm.github.io
- northflank.com/blog/best-sandboxes-for-coding-agents
- www.together.ai/blog/code-sandbox-code-interpreter
- github.com/zerobootdev/zeroboot
Evals / METR / Hamel
- hamel.dev/blog/posts/evals-faq
- hamel.dev/blog/posts/evals-skills
- hamel.dev/blog/posts/llm-judge
- metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks
- metr.org/blog/2026-1-29-time-horizon-1-1
- www.tobyord.com/writing/half-life
Math / Formal
- deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level
- deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms
- terrytao.wordpress.com/2025/11/05/mathematical-exploration-and-discovery-at-scale
- theorem.dev/blog/catching-bugs-with-fractional-proofs
- theorem.dev/blog/lf-lean
- www.math.inc/gauss
- www.nature.com/articles/s41586-023-06924-6
Aider
Misc commentators
- addyosmani.com/blog/long-running-agents
- arize.com/blog/swarm-management-of-agent-harnesses
- karpathy.bearblog.dev/year-in-review-2025
- leehanchung.github.io/blogs/2026/04/24/hidden-technical-debt-agent-runtime
- martinfowler.com/articles/harness-engineering.html
- mattrickard.com/levels-of-autonomy-in-ai-agents
- mattrickard.com/the-spec-layer
- simonw.substack.com/p/designing-agentic-loops
- simonwillison.net/2025/Nov/4/code-execution-with-mcp
- www.benedict.dev/optimization-arena-learnings
- www.geoffreylitt.com/2025/10/24/code-like-a-surgeon
- www.latent.space/p/daytona
- www.latent.space/p/harness-eng
- www.latent.space/p/notion
- www.latent.space/p/s3