On this page
RAG (Retrieval-Augmented Generation) became the default answer to "how do I give my AI access to my data" in 2023. It is also overengineered for most use cases. Wire generates structured, validated, citation-backed content as static files. Your AI agent reads them directly. No vector database. No embedding pipeline. No retrieval latency.
Why RAG Fails Static Knowledge Bases
A RAG pipeline requires document ingestion, chunking strategy, embedding model, vector database, retrieval logic, re-ranking, and prompt assembly. Each component can fail silently. Chunks lose context. Embeddings drift. Retrieval returns irrelevant passages. The system occasionally produces confident wrong answers from poorly chunked source material.
The structural weakness is measurable. The arXiv:2412.15605 paper on Cache-Augmented Generation (CAG) found that on HotPotQA Small, sparse RAG top-1 retrieval scores a BERTScore of 0.0673. Top-5 retrieval jumps to 0.7549. Top-10 drops back to 0.7461. The variance is not a tuning problem. It is a retrieval architecture problem.
Denis Urayev argued that for most applications, a file-first agent reading source documents directly outperforms RAG. The agent iterates, combines information from multiple files, and synthesizes answers the way a human analyst would. The bottleneck was never retrieval. It was source quality.
Cache-Augmented Generation: The Architecture That Replaces RAG
CAG, named and benchmarked in arXiv:2412.15605 (December 2024) and validated by 2026 production data, preloads an entire document corpus into an LLM's extended context window and precomputes a key-value (KV) cache. Retrieval is eliminated entirely.
The benchmark results are direct. On HotPotQA Large (64 documents, up to 85k tokens), CAG completed queries in 2.33 seconds versus 94.35 seconds for standard in-context loading, a 40x speedup. On SQuAD Large (7 documents, up to 50k tokens), CAG completed in 2.41 seconds versus 31.08 seconds, a 13x speedup. Experiments ran on Tesla V100 32G 8 GPUs using Llama 3.1 8B Instruction with a 128k token context window.
Accuracy improves alongside latency. CAG scores a BERTScore of 0.7759 on HotPotQA versus 0.7516 for the best dense RAG configuration and 0.7549 for the best sparse RAG configuration. CAG outperformed all RAG baselines across all three dataset sizes tested.
The cost threshold is also concrete. According to ucstrategies.com, the cache build cost breaks even at 6 queries, saving 245 tokens per query versus RAG's repeated embedding lookups. Post-cache, CAG processes roughly 10x fewer tokens per query than RAG.
A working implementation is available at github.com/hhhuang/CAG.
When to Use CAG vs RAG
The 2026 production data establishes clear thresholds, as analyzed by ucstrategies.com:
- Under 1 million tokens, updates weekly or less: CAG wins on latency, accuracy, and cost. This covers product documentation, compliance rules, internal FAQs, and product catalogs.
- Over 1 million tokens or sub-hour freshness required: standard RAG remains viable.
- Multi-step reasoning or tool execution required: Agentic RAG only.
As ucstrategies.com frames it: "Standard RAG sits in an awkward middle ground, slower than CAG for stable data, less capable than Agentic RAG for dynamic workflows."
One practical warning from the same analysis: teams under 5 engineers should not attempt hybrid CAG plus RAG architectures. Routing logic, cache invalidation, and dual-pipeline debugging outweigh the theoretical benefits. Pick one architecture based on corpus size and update frequency.
Explore all Wire use cases to find examples of CAG-ready knowledge bases.
The RAG market is still projected to grow from $1.96 billion in 2025 to $40.34 billion by 2035 (ResearchAndMarkets.com, October 2025), but that growth is concentrated in agentic and dynamic-data applications, not static corpus retrieval.
How Wire Fits the CAG Architecture
Wire solves the source quality problem that makes file-first and CAG approaches viable. A Wire-managed site sits squarely within the CAG viability window: content updates on a schedule, not in real time, and the corpus stays well under 1 million tokens for most knowledge bases.
Every page Wire generates has validated structure: frontmatter with title, description, and created date; headings following H1/H2/H3 hierarchy; no random formatting. Wire's style guide enforces inline citations with source URLs, so every factual claim has a traceable origin. Internal links between pages create a navigable knowledge graph the AI can follow to find related context.
Wire also generates two files that replace retrieval infrastructure directly. llms.txt is a machine-readable index of every page with title, URL, and description. AI agents consume this as their document index. search_index.json is a full-text search index. An agent with file access can search this instead of querying a vector database.
91 build rules prevent broken links, thin content, duplicate information, and structural errors. The source material is clean before the AI ever reads it.
Standard RAG pipeline
Documents → Chunking → Embeddings → Vector DB → Retrieval → Re-ranking → LLM
Failure points at every stage. Retrieval variance degrades accuracy. 94.35 seconds per query on HotPotQA Large.
Wire + CAG
Markdown files → Wire build → llms.txt + search_index.json + HTML → Precompute KV cache → AI agent reads files
No retrieval pipeline. 2.33 seconds per query on HotPotQA Large. BERTScore 0.7759 vs 0.7516 for best RAG.
Setting Up Wire as an AI Knowledge Base
Point your AI agent at the site/ directory after build:
site/llms.txtfor the page indexsite/search_index.jsonfor full-text searchsite/feed.xmlfor recent changes- Individual HTML files for detailed content
If your agent needs additional structured data, use raw_export in wire.yml to make specific markdown files available at their original paths:
extra:
wire:
raw_export:
- llms.txt
Write a _styleguide.md that emphasizes factual density over narrative flow (every paragraph should contain retrievable facts), consistent terminology (an agent matching queries to content needs consistent naming), and section headers as semantic labels (the agent uses headings to navigate).
Wire's enrich command continuously improves content quality. The news command keeps information current. Teams using Wire for competitive intelligence can feed the same structured output to both their website and their AI agents. The crosslink command maintains the knowledge graph. The build command validates everything. The output is a directory of clean, structured, interlinked files, rebuilt on every run, always in sync with source content.
Quick Start
Organize your knowledge base
Create markdown files in `docs/` with clear frontmatter: title, description, and created date on every page.
Build the static site
Run `python -m wire.build` to generate the static site, `llms.txt`, and `search_index.json` in one pass.
Connect your AI agent
Point your agent at `site/llms.txt` as its document index. Precompute the KV cache once. The 6-query break-even means even low-traffic internal tools recover the cost immediately.
Keep the knowledge base current
Run `python -m wire.chief enrich` to improve content quality iteratively. Schedule weekly builds via the bot to keep information fresh.
Limitations
Wire is a build-time pipeline. It does not serve real-time queries. If your use case requires sub-second retrieval from millions of documents, or freshness measured in minutes rather than days, standard RAG is the right tool. Wire fits when your knowledge base is thousands of pages (not millions), updates weekly (not per-second), and quality matters more than latency. That describes most product documentation sites, internal knowledge bases, compliance portals, and FAQ systems.
For teams building AI assistants on top of Wire-managed content, the practical path is clear: preload the corpus, compute the KV cache once, and serve queries directly. No vector database. No embedding pipeline. No retrieval variance. See all Wire use cases for more patterns.