Developer docs for AI coding agents

Research — This is a living document. It may be incomplete, outdated, or full of half-formed thinking.

The core problem: context engineering

Most developers now use AI tools daily, and a growing share of committed code is AI-assisted. The bottleneck has shifted. It's no longer about whether AI can write code. It's about whether AI has the right context to write the correct code. When agents hallucinate APIs, use deprecated patterns, or ignore project conventions, the root cause is almost always the same: a documentation problem.

LangChain defines "context engineering" as providing the right information and tools in the right format so the LLM can accomplish a task. When agents fail, it's usually not a model capability issue. The right context wasn't passed in.

LLMs are stateless. Every session starts from zero. They don't know your conventions, your tech stack, your forbidden patterns, or your dependency versions unless you tell them. This is fundamentally different from onboarding a human developer, who builds tacit knowledge over time from code review, pairing, and reading the codebase. Agents need explicit, structured context every time.

The five layers of agent documentation

I'm framing these as a stack, from most basic to most sophisticated. Teams typically adopt bottom-up.

Layer 1: Inline context files (CLAUDE.md / AGENTS.md)

Project-root markdown files loaded into every agent session automatically. CLAUDE.md for Claude Code, AGENTS.md for OpenCode/Cursor/Codex. Same pattern, different filenames.

Best practices that have emerged:

  • Keep them ruthlessly short. Every line competes for attention in the context window.
  • Only include universally applicable instructions (build commands, key conventions, forbidden patterns).
  • Treat them like code: iterate based on code review feedback.
  • Don't use them as a linter. Use actual linters/formatters via hooks instead.

A counterintuitive finding: a Princeton/industry study found that context files can actually reduce task success rates while increasing inference cost by 20%+. The takeaway isn't "don't use them." It's that bloated or overly prescriptive files actively hurt performance. Minimal requirements only.

The tiered documentation architecture pattern: CLAUDE.md (universal) → Skills (domain-specific, loaded on demand) → Agent guides in docs/ (deep reference, read only when needed).

Layer 2: The llms.txt standard

A plain Markdown file at a website's root (like robots.txt or sitemap.xml) that provides a structured index of key content for LLMs. Each page can offer an .md variant, stripping away HTML/JS/ads so agents get clean text.

LangChain benchmarks showed agents using llms.txt significantly outperformed both context-stuffing and vector search (RAG) approaches. Agents can reason about which documentation to fetch based on descriptive context, rather than relying on semantic similarity matching.

Adoption is real. Expo, Pinecone, Supabase, Svelte, LangChain, and ZenML all publish llms.txt files.

The llms-full.txt variant is a single concatenated file of all docs. AI agents visit it at 2x the rate of llms.txt. But it can be enormous (800K+ tokens), too big even for million-token context windows.

Limitation: it's a publishing standard. It helps when documentation already exists on the web. It doesn't solve the problem of local/private/internal documentation.

Layer 3: MCP documentation servers

Model Context Protocol has become the de facto standard for connecting agents to external tools and data. The documentation-specific MCP servers are the biggest practical development:

  • Context7 (Upstash): the #1 MCP server of 2026. Fetches up-to-date, version-specific docs from 33,000+ libraries directly into the agent's context. Works as both MCP server and CLI.
  • GitMCP: free, zero-config alternative. Point it at any public GitHub repo URL and it reads llms.txt, README, and docs. No API key needed.
  • Docfork: 9,000+ library coverage out of the box.
  • DeepWiki (Cognition/Devin team): goes beyond docs to generate architecture diagrams and interactive Q&A for understanding how libraries work internally.

The key architectural insight from Anthropic's engineering blog: as the number of connected tools grows, loading everything upfront is wasteful. Better to let agents navigate and load on-demand, like browsing a filesystem, not downloading the internet.

The coexistence pattern: MCP supports multiple servers simultaneously. Teams commonly run Context7 or GitMCP for public library docs alongside a local server for proprietary code.

Layer 4: Agent skills (progressive disclosure)

Skills (.claude/skills/SKILL.md) represent the most sophisticated approach to agent documentation. The architecture: YAML frontmatter loaded at startup (lightweight, just name + description), SKILL.md content loaded on-demand when relevant, bundled scripts executed without loading their source into context, reference files read only when the specific task needs them.

Why this matters: there's no context penalty for bundled content that isn't used. A skill can include API docs, large datasets, or extensive examples, but the agent only reads what each task requires.

The ecosystem has exploded: 1,200+ community skills, official skills from Anthropic, DuckDB, Google, GSAP, ZenML, and others. The universal SKILL.md format works across Claude Code, Cursor, Gemini CLI, Codex, and more.

The practical pattern for documentation: create skills that point to locally stored reference docs. The skill itself is a thin routing layer ("when doing X, read docs/agent-guides/x.md"). The deep content lives in files the agent navigates to on demand.

Layer 5: Source code as documentation

The least-served layer, but arguably the most valuable for edge cases. Current tools focus on processed documentation (markdown, API references). But when an agent needs to understand how a library actually handles an edge case, the docs often don't say. The source code does.

Existing approaches:

  • Gitingest: converts any GitHub repo into a single text file for LLM consumption.
  • DeepWiki: generates architecture-level understanding from source.
  • ReadMe.LLM: a proposed framework for creating LLM-specific documentation from library source code.

The gap: no widely-adopted tool yet makes your actual installed dependency source code (node_modules, site-packages, etc.) locally queryable by an agent. This is an open opportunity.

Why "show, don't tell" matters more for agents

This is the insight that cuts across every layer of the stack: agents learn patterns from examples far more reliably than from abstract instructions. And yet, most documentation strategies focus on rules and descriptions while underinvesting in concrete, working code examples.

Why examples matter disproportionately

LLMs are in-context learners. They don't "understand" rules the way humans do. They pattern-match. A rule like "use camelCase for variable names" is less effective than three examples of correctly-named variables in context. This is the few-shot prompting principle applied to documentation.

The ReadMe.LLM research (2025) tested this rigorously. The framework includes three components: rules, library descriptions, and code snippets with function signatures and usage examples. When all three were provided, code generation accuracy improved to near-perfect (100% in one case study across all tested models). The code snippets were the most impactful component, not the prose descriptions.

Context engineering best practice now emphasises curating "diverse, canonical few-shot examples that portray expected behaviour instead of listing every possible scenario."

Martin Fowler's context engineering guide highlights that your existing codebase is itself documentation. Agents read your code to learn patterns. The question is whether your code is readable enough to serve as implicit examples ("AI-friendly codebase design").

How examples are sourced and delivered

Existing codebase as examples (implicit). The most underrated source. When agents search your codebase before writing new code, they're doing few-shot learning from your existing patterns. Codebase-Memory (2026) builds persistent Tree-Sitter-based knowledge graphs via MCP, parsing 66 languages, letting agents discover how your codebase is actually structured rather than relying on documentation that may be out of sync. Implication: well-structured, consistent existing code is a form of documentation. Messy codebases with inconsistent patterns actively confuse agents.

Examples bundled in skills (explicit). The Agent Skills specification supports a references/ directory for additional documentation and an assets/ directory for templates and resources. Both are natural homes for canonical code examples. The SKILL.md body itself can include "code patterns, gotchas, etc." in the markdown instructions. Skills that include worked examples of inputs and outputs perform better than those with only procedural steps. The pattern: include 2-3 canonical examples of the exact output format or code pattern you expect, directly in the SKILL.md. A skill for writing API endpoints that includes one complete, correctly-structured endpoint example will produce more consistent output than a skill that lists 15 rules about endpoint structure.

ReadMe.LLM: LLM-optimised library documentation with embedded examples. The ReadMe.LLM framework proposes that library developers maintain LLM-specific documentation alongside their traditional docs. Key components: rules (how to use the library), a library description, and code snippets with clear function signatures paired with illustrative examples. This is different from llms.txt (which indexes existing documentation) and from traditional README.md (which targets human readers). ReadMe.LLM is specifically structured for how LLMs process information. Even well-documented, popular libraries benefit from LLM-specific documentation. Lesser-known libraries see dramatic improvements, going from frequent hallucination to near-perfect accuracy. If you maintain an internal library, creating a ReadMe.LLM file with 5-10 canonical usage examples may be the single highest-ROI documentation investment you can make for agent-assisted development.

Context7 and MCP doc servers: examples from upstream. Context7 explicitly fetches "up-to-date code examples and documentation" from upstream sources, not just API descriptions, but working code showing how to use libraries. This is why it outperforms approaches that only provide API signatures or documentation prose. The examples ground the agent in concrete, working patterns.

Your codebase's own history (emerging). Code Researcher (2026) demonstrates multi-step reasoning over commit history, an agent that looks at how similar changes were made in the past to inform new changes. DuckDB's agent skills include a read-memories skill that searches past Claude Code session logs to recover context from previous conversations, including examples of what worked. SWE-ContextBench, a new benchmark, explicitly evaluates whether agents can reuse previous experience across related problems, measuring the value of accumulated examples rather than solving each task from scratch.

Practical takeaways on examples

  • For every rule in your CLAUDE.md, ask: "Could I replace this with an example?" If yes, the example will be more effective.
  • When creating skills, include a "## Examples" section with 2-3 worked examples of inputs and expected outputs. This is the highest-leverage content in any skill.
  • If you maintain internal libraries, create a ReadMe.LLM or equivalent with function signatures, usage examples, and common patterns specifically structured for LLM consumption.
  • Invest in codebase consistency. If your existing code follows clear patterns, the agent learns from it automatically. If patterns are inconsistent, examples in documentation have to work harder to compensate.
  • Consider extracting "golden examples" from your best PRs and code reviews into a reference directory that skills can point to.

What's working in practice

Patterns that experienced teams have converged on:

  • Start with a minimal CLAUDE.md/AGENTS.md. Build commands, test commands, key conventions, forbidden patterns. Nothing else. Iterate based on code review feedback.
  • Install Context7 or GitMCP for public library docs. This solves the hallucinated-API problem with minimal effort. One MCP server config line.
  • Use skills for domain-specific workflows. Don't put everything in the root context file. Skills load on demand and don't bloat every session.
  • Adopt llms.txt if you publish documentation. Your users' AI agents are already visiting your docs. Give them a clean, structured entrypoint.
  • Use hooks and linters for deterministic behaviour. Don't ask the LLM to enforce formatting or style. That's what tooling is for. Reserve agent instructions for judgement calls.
  • Treat documentation as code. Every time a reviewer catches a mistake the agent made, add a rule or guardrail. Every time the agent ignores a rule, the file is probably too long. Prune it.
  • Favour examples over rules. Where possible, replace prescriptive rules with 2-3 canonical code examples showing the expected pattern. LLMs pattern-match from examples more reliably than they follow abstract instructions.

The CLI-first shift

AI coding agents increasingly live in the terminal (Claude Code, Gemini CLI, Codex CLI, Aider, OpenCode). Every major developer tool company (GitHub, Stripe, Supabase, Vercel, PostHog, Resend) has shipped or updated a CLI in 2025-2026.

The rationale: when you're working with an agent at midnight, you don't want to context-switch to a browser and three dashboards. You want to stay in flow. CLIs output text. Agents consume text. Natural fit.

The MCP vs CLI question: CLI tools return text output via shell commands; MCP uses a structured schema injected into the model's context window. Both work. Many tools support both modes (Context7 offers CLI + Skills mode and MCP mode).

What's still missing

  • Private/internal library documentation: cloud-based doc servers like Context7 don't cover your internal packages, forked dependencies, or proprietary SDKs. You need a local solution.
  • Dependency source code access: the gap between "read the docs" and "read the code" remains largely unfilled for agent workflows.
  • Cross-project documentation hubs: developers working on multiple projects don't have a good way to maintain a single, agent-accessible documentation repository that spans projects.
  • Automatic context file maintenance: CLAUDE.md and AGENTS.md files drift and bloat over time. Tooling to audit, prune, and suggest improvements is nascent (skill-optimizer exists but is early).
  • Standardisation: the proliferation of context file formats (CLAUDE.md, AGENTS.md, COPILOT-INSTRUCTIONS.md, .cursorrules, .windsurfrules, llms.txt, SKILL.md) creates fragmentation. The universal SKILL.md format is the closest thing to a standard, but we're not there yet.
  • Automated example extraction: no mainstream tool yet watches your codebase, identifies the strongest patterns, and automatically generates or updates the canonical examples your agent documentation references. Today this is a manual process. Tools like Codebase-Memory and Code Researcher hint at what's possible, but the "automatically keep my examples up to date" workflow doesn't exist yet.

Where this is heading

The documentation problem for AI coding agents is really a context engineering problem. The tools and standards emerging in 2026, llms.txt, MCP documentation servers, agent skills, LLM-oriented library docs, represent a genuine shift from documentation-for-humans to documentation-for-agents-and-humans. And the most powerful lever in any of these systems turns out to be the simplest: concrete, working examples. Rules tell agents what to do; examples show them how. The teams getting the most out of AI coding agents aren't the ones with the biggest context files. They're the ones with the most disciplined approach to putting the right information (and the right examples) in front of the agent at the right time, and nothing more.

Sources and further reading

  • Anthropic: "Code execution with MCP" (anthropic.com/engineering)
  • Anthropic: "Best practices for Claude Code" (code.claude.com/docs)
  • Anthropic: "Agent Skills overview" (platform.claude.com/docs)
  • Anthropic: "2026 Agentic Coding Trends Report"
  • HumanLayer: "Writing a good CLAUDE.md"
  • Matthew Groff: "Implementing CLAUDE.md and Agent Skills In Your Repository"
  • Stack Overflow Blog: "Building shared coding guidelines for AI (and people too)"
  • LangChain: "Context engineering in agents"
  • Mintlify: "What is llms.txt?"
  • Document360: "5 Ways to Give AI Coding Agents Access to Documentation"
  • Princeton et al.: "Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?" (arxiv)
  • Addy Osmani: "My LLM coding workflow going into 2026"
  • Martin Fowler / Birgitta Böckeler: "Context Engineering for Coding Agents" (martinfowler.com)
  • ReadMe.LLM: "A Framework to Help LLMs Understand Your Library" (arxiv, 2025)
  • Codebase-Memory: "Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP" (arxiv, 2026)
  • SWE-ContextBench: "A Benchmark for Context Learning in Coding" (arxiv, 2026)
  • Code Researcher: "Deep Research Agent for Large Systems Code and Commit History" (arxiv, 2026)
  • Agent Skills Specification (agentskills.io)
  • Packmind: "Context Engineering AI Coding" — on automated context retrieval from codebases
  • Firecrawl: "Agent Skills Explained: How SKILL.md Files Work and Why They're Everywhere"
← Back to research