Skip to content

Your AI Is Confidently Quoting Stale Docs. The Fix Is 20 Years Old.

"Keep one source of truth and publish it everywhere" is a decades-old discipline. AI didn't make it obsolete, it made ignoring it expensive, because a machine can't tell it's reading the stale copy. It just repeats it, with a straight face, to everyone who asks.

Perspective · June 2026

Backed by 10 sourcesLast researched Jun 20268 min read

TL;DR

  • ·The pattern, write your docs once in a single source and let a pipeline publish them everywhere, is not new. Technical writers have done "author once, publish to many" since the 1990s; the developer version ("docs as code") is a decade old. If that were the whole story, there'd be nothing to write.
  • ·What changed is the destinations. For twenty years "everywhere" meant a handful of stable human formats, a web page, a PDF, a help site. Now the same knowledge also has to feed an exploding set of machine surfaces, llms.txt, MCP servers, RAG vector stores, each ingesting it in its own format, each one more you have to keep in sync.
  • ·And the consumer got worse at catching drift. A human who reads two contradictory versions notices and uses judgment. An LLM doesn't, it confidently states whichever stale copy it retrieved. So AI didn't invent the need for a single source of truth. It raised the price of not having one.

A Small, Familiar Disaster

Here's a scene every operator will recognise by 2026. Someone asks an AI assistant a question about your product, your refund window, your pricing, your API's rate limit. It answers instantly, fluently, with total confidence. And it's wrong, because somewhere in the pile of places your documentation lives, it found an old copy. Not a malicious one. Just one nobody remembered to update when the real answer changed.

The uncomfortable part isn't that the copy was stale. Stale copies have always existed. The uncomfortable part is who read it. A human flipping between two versions of a policy notices the contradiction and asks which is right. An AI doesn't. It picks one and states it as fact, an AI "hallucination" is precisely defined as output "that contains false or misleading information presented as fact"[8]. The model isn't lying. It genuinely can't tell it's holding the outdated page. It will "embed plausible-sounding" falsehoods inside an otherwise reasonable answer[8], which is exactly what makes them so hard to catch.

The fix for this is not a better model. It's a discipline that's older than most of the people deploying the AI.

What I Actually Do on This Site

Let me ground this in something concrete, because I run it every day. This site has a single content catalog, one file that is the source of truth for every article, page, and playbook. I add one entry to it, and a build pipeline propagates that entry to four places at once: the human-readable web page, the XML sitemap, the RSS feed, and llms.txt (the plain-text file AI tools read). The same catalog also feeds the site's MCP server, so an AI agent can list and fetch my writing directly. One source, five destinations, and a build check that fails the deploy if any of them drift out of sync.

I didn't invent any of that. I assembled it out of boring, well-understood parts. And that's the point: the architecture was sitting there, fully formed, long before I pointed it at AI consumers. The interesting question isn't how I built it. It's why it suddenly matters so much more than it used to.

None of This Is New (And That's the Honest Part)

Let me kill the "visionary new pattern" framing before anyone reaches for it. Single-sourcing is genuinely old.

Technical writers have practised "single-source publishing", author once, reuse across "different forms of media and more than one time", for decades[1]. The whole point was efficiency: editing "need only be carried out once, on only one document," and Wikipedia even calls that document, in plain words, "the single source of truth," because "corrections are only made one time"[1]. This isn't recent. Its roots trace back to around 1990, and by the mid-90s manufacturers were doing it at industrial scale, Ford could reportedly generate twelve model years of documentation from a single tagged source file[1]. The discipline got a formal standard, DITA, an open XML architecture built around reusable "topics," created by IBM and made an OASIS standard back in 2005[2].

Software teams rebuilt the same idea in their own tooling and called it "docs as code": write documentation "with the same tools as code", Git, Markdown, code review, automated tests[3], then let a static site generator compile that single Markdown source into a published site[4]. A genuinely nice property falls out of doing it this way: human review is built in. You can literally "block merging of new features if they don't include documentation"[3]. The source of truth and the human approval gate are the same pipeline.

So if you want to be dismissive, you can be, fairly: one source, many outputs, reviewed before it ships is a solved, twenty-year-old idea. I'll concede that completely. The thesis of this piece isn't that the pattern is new. It's that the pattern's value just went up by an order of magnitude, for two reasons that genuinely are new.

Reason One: The Destinations Exploded, and Went Machine

For most of single-sourcing's life, "publish everywhere" meant a short, stable list of human formats: a web page, a PDF, a printed manual, an in-app help panel. You could maintain those by hand if you had to. The list barely changed from one year to the next.

Then AI tools started consuming documentation, and each one wanted the same knowledge in a different shape, stored in a different place:

  • A Markdown file for inference time. The llms.txt proposal exists precisely because the human web isn't machine-ready: "context windows are too small to handle most websites," and turning messy HTML into clean LLM text is "difficult and imprecise"[5]. So you publish a separate, purpose-built representation, and explicitly in addition to robots.txt and the sitemap, not instead of them[5].
  • A live protocol for agents. The Model Context Protocol, "a USB-C port for AI applications"[6], lets an agent call your content as tools rather than scrape it. It's supported across Claude, ChatGPT, VS Code, Cursor, "build once and integrate everywhere"[6], which means it's yet another surface the same knowledge has to be exposed through.
  • An embedded vector store for retrieval. A RAG system "converts" your documents "into LLM embeddings" and stores them "in a vector database"[7], a third representation, in a third kind of store, that has to be re-ingested whenever the source changes.

That's three new machine destinations, with different formats and different storage models, that didn't exist a few years ago, and the list is still growing. Every one of them is another copy that can fall behind. The old pattern was a convenience when "everywhere" was three stable places. It becomes a necessity when "everywhere" is a moving target that multiplies with every AI tool your audience adopts. The value of single-sourcing scales with the number of destinations, and the number of destinations just stopped being constant.

Reason Two: The Consumer Can't Tell It's Wrong

This is the half that actually changes the math, and it's the half the hype skips.

When the consumers of your documentation were humans, drift was an annoyance with a built-in safety net: the reader. A person who hits two conflicting answers stops, frowns, and figures out which is current. That judgment was doing quiet, unpaid quality control on your behalf for twenty years.

An AI consumer removes that safety net. RAG was supposed to help here, ground the model in your real source data so it stops making things up, and it does reduce hallucination[7]. But it's only ever as good as what it retrieves. The same write-up that praises RAG is blunt about the failure mode: a system "may retrieve factually correct but misleading sources," and the genuinely dangerous case is when it "merges outdated and updated information in a misleading manner"[7]. Feed it two versions of the truth and it won't flag the conflict. It will blend them, or pick one, and present the result with the same "confident statements" it uses for everything else[8].

So the cost curve flips. With human readers, a stale copy was a small, self-correcting nuisance. With AI readers, a stale copy becomes a confident, automated, infinitely-repeatable wrong answer, delivered to every person who asks, with no frown to slow it down. Same drift. Wildly different blast radius. That is what AI changed, not the need for a single source of truth, but the price of not having one.

The Strongest Argument Against Me

I want to give the skeptic their best shot, because it's a good one. The objection goes: you've dressed up a twenty-year-old idea in AI clothes. A headless CMS already makes content "accessible via an API for display on any device, without a built-in front end"[9], one source, many surfaces, including non-web targets, years ago. And "single source of truth" is a named, well-worn architecture: "every data element is mastered (or edited) in only one place," "master data is never copied and instead only references to it are made," with updates "comprehensively distributed" to every downstream system[10]. The stale-copy problem I'm dramatising? That's just the classic, long-documented risk of denormalised, duplicated data[10]. So, the skeptic concludes, there's no new idea here.

And they're right about the architecture. They're right that none of the building blocks are novel. Where I think they're wrong is treating "the idea is old" as a synonym for "the advice is unimportant." Plenty of old ideas become urgent only when conditions change around them. Hand-washing was an old idea before germ theory explained why skipping it killed people. Single-sourcing was hygiene when the only victim of drift was a confused human who'd recover. It becomes load-bearing the moment your most prolific reader is a machine that can't recover, can't doubt, and answers thousands of people a day. The architecture didn't change. The consequences of neglecting it did.

What This Means If You Publish Anything

Two takeaways, one tactical and one strategic.

Tactical: pick one canonical source for each thing you know, and generate every other surface from it, including the machine ones. Don't hand-maintain your website, your llms.txt, your knowledge base, and your agent's context as four separate artifacts; that's four chances to drift, and the AI surfaces are the ones with no human to catch the error. Wire them to one source and add the cheapest possible guard, a check that fails your build when they diverge. You don't need DITA or a heavyweight CMS to do this. You need one source and the refusal to let copies of it exist.

Strategic: the discipline that matters in the AI era isn't writing more documentation. It's making sure there is exactly one version of each truth, and that every consumer, human or machine, drinks from it. For twenty years, the quality of your docs was policed for free by readers who noticed when something looked off. That free labour is going away, replaced by confident machines that notice nothing. The single source of truth used to be a tidiness preference. It's quietly becoming the thing that decides whether the AI representing you to the world quotes you, or misquotes you.

The idea is old. What changed is the price of ignoring it. A human who reads your stale doc shrugs and asks a colleague. A machine reads it and tells the world, confidently, at scale, with a straight face.

Sources & Further Reading

10 sources researched for this article. Last updated when the page was published.

The old pattern

  1. Single-source publishing, Wikipedia, 2026Author-once / publish-to-many is the canonical definition; the source document is literally called "the single source of truth," corrections "made one time." Roots traced to ~1990 (Windows 3.0) and a first project in 1993; Ford could generate 12 model years from one tagged file
  2. Darwin Information Typing Architecture (DITA), Wikipedia, 2005Open XML standard for topic-oriented content, created by IBM, maintained by OASIS. Core DTD March 2001, OASIS TC April 2004, v1.0 approved June 2005. Reusable topics + conref transclusion = single-sourcing formalized 20+ years ago

Docs as code

  1. Docs as Code, Write the Docs, 2026"Writing documentation with the same tools as code": Git, Markdown/rST/Asciidoc, code reviews, automated tests. Human review is native — "you can block merging of new features if they don't include documentation"
  2. Static site generator, Wikipedia, 2026The build engine that turns single-source plain-text (Markdown) + templates into static HTML; software documentation is a named use case (Docusaurus, Jekyll, Hugo)

The new machine destinations

  1. The /llms.txt file, Jeremy Howard, 2024-09-03A proposed Markdown file at a site root to feed LLMs at inference time, because "context windows are too small to handle most websites" and converting HTML to LLM-friendly text is "difficult and imprecise." Designed to coexist with robots.txt / sitemap, not replace them
  2. What is the Model Context Protocol (MCP)?, Model Context Protocol (Anthropic), 2025Open standard, "a USB-C port for AI applications"; supported across Claude, ChatGPT, VS Code, Cursor — "build once and integrate everywhere"
  3. Retrieval-augmented generation, Wikipedia, 2026Documents are "converted into LLM embeddings" and "stored in a vector database"; knowledge is updated by changing the source, not retraining. But it "may retrieve factually correct but misleading sources," and the worst case is to "merge outdated and updated information in a misleading manner"

Why staleness got expensive

  1. Hallucination (artificial intelligence), Wikipedia, 2026A response "that contains false or misleading information presented as fact"; models "embed plausible-sounding random falsehoods" and make "confident statements that are not true" — they do not signal which copy they read

The honest counter-argument

  1. Headless content management system, Wikipedia, 2026Content "accessible via an API for display on any device, without a built-in front end" — the one-source-many-surfaces architecture already existed, including non-web targets like IoT
  2. Single source of truth (SSOT), Wikipedia, 2026Named architecture: "every data element is mastered (or edited) in only one place"; "master data is never copied and instead only references to it are made," updates "comprehensively distributed." The stale-copy failure mode is the classic risk of denormalization