On this page
- WebSearch is server-side. Obviously.
- Reverse-engineering the encrypted fields
- How search queries are constructed: not at all
- The date instruction: monthly, not daily
- Black-box observation: keyword-based, not semantic
- WebFetch: what the pipeline does to your content
- What gets indexed, what does not
- The Accept header prefers Markdown
- Five layers between your content and Claude
- The cache changes everything about freshness
- The domain blocklist is a kill switch
- Three User-Agents, three identities
- Claude Code can pay for content
- Same web, two different views
- claude.ai (chat)
- Claude Code (CLI)
- Hypotheses about Anthropic's search engine
- Or it is all an April Fools' joke
- Appendix: the full preapproved domain whitelist
There is a growing industry around "Generative Engine Optimization" – GEO – selling the idea that optimizing for LLM search requires fundamentally new thinking. I read the source code. It is keyword matching, an HTML-to-Markdown converter with zero configuration, and a small model that paraphrases your content with a 125-character quote limit. GEO is SEO with extra filters. Prove me wrong.
Same search engine, different content pipeline
Claude Code and claude.ai share the same search index but see different versions of your website. Each result returns a title, a URL, a page age, and roughly 500 words of encrypted snippet text. Claude Code discards everything except title and URL. When it needs content, it fetches your page separately, converts the HTML to Markdown via Turndown – losing JSON-LD, Schema.org, alt-text, and meta descriptions – then sends the Markdown to a smaller model (Haiku) that paraphrases it with a 125-character quote limit. 107 documentation domains skip this filter. Everyone else gets compressed. The model is instructed to include the current year and month in search queries, but receives no guidance on how to formulate them. The search API is still in beta.
On March 31 – yesterday – I published a black-box study (in German, small-scale exploratory research) reverse-engineering how Claude's web search selects and ranks content. 170 queries against a handful of controlled pages, run in the claude.ai web UI. Not a large-scale study – a directional probe to see what differs from traditional search. It took considerable effort.
On April 1, a GitHub repository appeared containing what claims to be the full source code of Claude Code – the CLI tool, not the chat UI. They leaked everything. These are different products. The chat runs entirely server-side. Claude Code runs locally and talks to the same API. The search engine behind both is likely the same (both use web_search_20250305), but the content fetching pipeline is different – Claude Code fetches locally, the chat fetches server-side.
One day. From black box to white box – of a related but not identical product.
The timing is either a coincidence, an April Fools' joke, or the universe telling me that patience is overrated. I read the source anyway. Here is what it reveals, where it aligns with my observations, and what remains unknown.
Search
Server-side engine, keyword matching, encrypted snippets, no query guidance.
Fetch
Turndown, Haiku paraphrase, 125-char quote limit, five content filters.
Compare
claude.ai sees encrypted snippets directly. Claude Code throws them away.
Conclude
GEO is SEO with extra filters. The search API is still in beta.
WebSearch is server-side. Obviously.
Claude Code does not contain a search engine. It sends your query to Anthropic's API, which runs the search server-side. This is not surprising – shipping a search index with a CLI tool would be absurd. But it matters, because it means the source code reveals the interface to the search engine without revealing the engine itself.
// github.com/codeaashu/claude-code/blob/main/
// src/tools/WebSearchTool/WebSearchTool.ts#L76
function makeToolSchema(input) {
return {
type: 'web_search_20250305',
name: 'web_search',
max_uses: 8,
}
}
The tool sends your query to Anthropic's API as a server_tool_use request. The actual search engine, the index, the ranking – all server-side. Claude Code is a wrapper. The search happens behind a wall you cannot inspect even with the source code in front of you.
The API returns more than Claude Code keeps. According to Anthropic's documentation, each search result contains title, url, page_age, encrypted_content, and encrypted_index. But Claude Code maps only two fields:
// WebSearchTool.ts#L124
const hits = block.content.map(
r => ({ title: r.title, url: r.url })
)
page_age – dropped. encrypted_content (the actual snippet text, encrypted so the model can cite it without exposing raw content in client code) – dropped. encrypted_index (the citation reference) – dropped.
This is the key difference between claude.ai and Claude Code. In my black-box study on claude.ai, I observed structured snippets: 1–6 fragments per result, ~500 words total, keyword-proximity-driven selection. Those snippets come from encrypted_content – the server-side search returns them, and the claude.ai model reads them directly.
Claude Code throws all of that away. It keeps only the titles and URLs. If it wants page content, it must call WebFetch separately – which triggers the Haiku pipeline with the 125-character quote limit, the Turndown conversion, the 100K truncation. The snippet structure I measured in my black-box study does not apply to Claude Code at all. Claude Code never sees those snippets.
The text blocks that the secondary search model writes (its commentary on the results) are what Claude Code actually works with. These are not the structured snippets from the index – they are a model's free-form summary, written after seeing the encrypted results server-side.
Reverse-engineering the encrypted fields
The code shows that encrypted_content and encrypted_index exist but are discarded. My black-box study on claude.ai – where the model does use them – lets me infer what they contain.
encrypted_content is likely the snippet fragments. My study measured their structure:
- 1–6 fragments per result, typically 5
- ~94 words per fragment (range 30–120), ~500–600 words total
- Fragment start: 70% at sentence boundaries, 10% mid-sentence, 4% at navigation/breadcrumbs
- Fragment end: 50% at a period, 30% hard cutoff at ~120 words
- Selection driven by keyword proximity – three keywords in one sentence was selected in 6 out of 6 queries
- An "anchor fragment" (the intro paragraph with highest keyword density) appeared in every query tested
- Fragments are query-dependent: same URL, different query, different fragments selected
This means the search index stores the full page text, not pre-computed snippets. Fragment selection happens server-side after ranking – my study found no correlation between fragment coverage breadth and ranking position.
encrypted_index is likely the citation pointer. My study tested citation notation (X-1, X-3, X-5 – document X, sentence Y). Changing the sentence index did not change what the frontend displayed. The index is an internal reference the model uses to point at specific positions within encrypted_content when formulating citations.
Is it actually encrypted? I called the API directly and inspected the raw response. encrypted_content is a binary blob, Base64-encoded, 4,000–6,300 characters per result. Not readable. Not JSON. Not plaintext. The encoding starts with a consistent byte prefix suggesting a structured binary format. page_age is a plain relative string like "6 days ago" or null – likely from the search index's crawl timestamp, not from the page's HTML meta tags.
The size confirms the black-box observations. 6,300 characters of Base64 decode to roughly 4,700 bytes. After encryption overhead, that leaves approximately 3,500–4,000 bytes of plaintext – roughly 500–650 words at typical word length. My black-box study measured a consistent ~500–600 word snippet budget. The encrypted blob is exactly the right size to contain the fragments I observed.
Claude Code sidesteps these fields entirely and fetches content through its own WebFetch pipeline instead.
Eight searches per call. Hardcoded. The web-search-2025-03-05 beta header in the codebase confirms this is still an experimental feature, not a stable API.
The instructions Claude receives for using WebSearch are also in the source:
CRITICAL REQUIREMENT - You MUST follow this:
After answering the user's question,
you MUST include a "Sources:" section
at the end of your response.
IMPORTANT - Use the correct year
in search queries:
The current month is [auto-injected].
You MUST use this year when searching
for recent information.
Source: src/tools/WebSearchTool/prompt.ts. The model is forced to cite sources and instructed to include the current date in queries. This explains the temporal awareness I observed in my black-box study – it is a prompt instruction, not an index feature.
How search queries are constructed: not at all
The most consequential finding for GEO may be what is absent. I searched the entire codebase for guidance on how Claude should formulate search queries. There is none.
The tool description the model receives is:
- Allows Claude to search the web and use
the results to inform responses
- Use this tool for accessing information
beyond Claude's knowledge cutoff
The input schema describes the query field as:
query: z.string().min(2)
.describe('The search query to use')
"The search query to use." Minimum 2 characters. No instructions on keyword extraction, language choice, query decomposition, phrase quoting, or synonym expansion. The model decides entirely on its own – based on training, not on prompt engineering – what query string to write.
That query string then travels verbatim through a secondary model call:
// WebSearchTool.ts#L257-258
const userMessage = createUserMessage({
content:
'Perform a web search for the query: '
+ query,
})
No reformulation. No expansion. The literal string the model composed goes to the server-side search engine, which – as my black-box study established – does keyword-based matching.
For companies optimizing for GEO: you cannot influence the query construction. What you can do is ensure your content contains the literal terms the model is likely to use. And there is one term the model is explicitly told to include.
The date instruction: monthly, not daily
The "[auto-injected]" date in the search prompt is not a placeholder. It is computed dynamically:
// constants/common.ts#L26-33
// Returns "Month YYYY" (e.g. "February 2026")
// in the user's local timezone.
// Changes monthly, not daily — used in tool
// prompts to minimize cache busting.
export function getLocalMonthYear() {
const date =
process.env.CLAUDE_CODE_OVERRIDE_DATE
? new Date(
process.env.CLAUDE_CODE_OVERRIDE_DATE)
: new Date()
return date.toLocaleString('en-US',
{ month: 'long', year: 'numeric' })
}
Two code comments tell the story:
// Changes monthly, not daily — used in
// tool prompts to minimize cache busting.
The value changes only when the calendar month rolls over – "April 2026" becomes "May 2026" on May 1. It is not frozen at session start; it is computed fresh on each call via new Date(). But because it only includes month and year (not the day), the string changes at most once a month. The code comment explains why: changing the tool prompt invalidates the server-side prompt cache for all users.
A separate function (getSessionStartDate) IS memoized and frozen at session start – but that one feeds the system prompt, not the search tool.
The prompt tells the model:
The current month is April 2026.
You MUST use this year when searching for
recent information, documentation, or
current events.
Example: If the user asks for "latest React
docs", search for "React documentation"
with the current year, NOT last year
The model is instructed to inject year and month into search queries. If the server-side search is keyword-based (my black-box observation, not proven), then content containing "2026" in its body text would match queries that include "2026."
But there is a catch. On May 1, the prompt says "May 2026" – and the model may search for "best CRM tools May 2026." Nothing relevant was published in May yet. The month-level granularity helps in the second half of a month but hurts in the first half. For GEO, the year matters more than the month.
CLAUDE_CODE_OVERRIDE_DATE is an internal Anthropic developer variable (the comment says "ant-only date override") – not a user-facing feature.
Black-box observation: keyword-based, not semantic
My study found that Claude's search appears to use exact lexical matching. Synonym queries returned zero results – "Neukäufer gewinnen" found nothing when the actual text said "Neukunden akquirieren." This is an empirical observation from a small sample, not a proven fact. The search engine is server-side and its algorithm is invisible.
What the source code does reveal: the web_search input schema has exactly three parameters.
// WebSearchTool.ts#L25-37
z.strictObject({
query: z.string().min(2),
allowed_domains: z.array(z.string()).optional(),
blocked_domains: z.array(z.string()).optional(),
})
strictObject with additionalProperties: false. No semantic: true flag. No embedding mode. No site: operator. No date range. No language filter. A plain text query and optional domain filters. My black-box observation that additionalProperties: false exists on the schema – confirmed in the code, character for character.
WebFetch: what the pipeline does to your content
WebFetch is the tool that actually reads your website. And it is more complex than anyone assumed.
The flow: validate URL, upgrade HTTP to HTTPS, check a domain blocklist via api.anthropic.com, fetch with axios, convert HTML to Markdown via Turndown, truncate to 100,000 characters, then – and this is the part that matters for GEO – send the Markdown to Haiku.
// github.com/codeaashu/claude-code/blob/main/
// src/tools/WebFetchTool/prompt.ts#L23
export function makeSecondaryModelPrompt(
markdownContent, prompt, isPreapprovedDomain
) {
const guidelines = isPreapprovedDomain
? `Provide a concise response based on
the content above. Include relevant
details, code examples, and
documentation excerpts as needed.`
: `Enforce a strict 125-character maximum
for quotes from any source document.
Use quotation marks for exact language
from articles; any language outside of
the quotation should never be
word-for-word the same.`
Your website's content is not returned to Claude directly. A smaller, faster model – Haiku – reads the Markdown and writes a summary. That summary is what Claude sees. Your page goes through two AI models before it influences an answer.
The exact instructions Haiku receives depend on the domain tier:
Provide a concise response based on the
content above. Include relevant details,
code examples, and documentation excerpts
as needed.
Full reproduction allowed. Code examples pass through. Documentation excerpts preserved.
Provide a concise response based only on
the content above. In your response:
- Enforce a strict 125-character maximum
for quotes from any source document.
- Use quotation marks for exact language
from articles; any language outside of
the quotation should never be
word-for-word the same.
- You are not a lawyer and never comment
on the legality of your own prompts
and responses.
- Never produce or reproduce exact
song lyrics.
125 characters per quote. Mandatory paraphrasing. Your content arrives as Haiku's interpretation, not your words.
My black-box study measured web_fetch returning 2,000–8,000+ words in page-order Markdown with preserved headings. That is what goes into Haiku. What comes out is the compressed summary Claude actually reasons with.
What gets indexed, what does not
My study tested specific phrases hidden in JSON-LD, image alt-text, and meta descriptions. None were retrievable via web_search. The source code offers a plausible explanation – though this applies to WebFetch (the page reader), not to the server-side search index, which remains opaque.
Turndown converts HTML to Markdown. That is the entire content pipeline for WebFetch. Turndown is an HTML-to-Markdown converter. It processes <body> content: paragraphs, headings, lists, tables, links, code blocks. It does not process:
<script>tags (where JSON-LD lives) – stripped<meta>tags (where descriptions live) – in<head>, ignored<img alt="...">attributes – Turndown drops images by default<nav>,<header>,<footer>– converted to text if present in body
My study found that navigation and breadcrumb text sometimes consumed fragment slots. The code confirms this: Turndown does not distinguish navigation from content. If your breadcrumbs contain keywords, they compete with your article text for Haiku's attention.
// utils.ts#L493-494
if (contentType.includes('text/html')) {
markdownContent =
(await getTurndownService()).turndown(htmlContent)
}
No custom Turndown rules. No <nav> stripping. No <script> removal beyond Turndown's defaults. The instance is created with new Turndown() – zero configuration. Whatever Turndown's default rules preserve, Claude sees. Whatever they strip, Claude never knows existed.
Schema.org in JSON-LD? Invisible. <script type="application/ld+json"> is a script tag. Turndown strips it.
Image alt-text? Invisible. Turndown's default removes images entirely.
Meta descriptions? Invisible to WebFetch. My study found they sometimes appear as ~30-word fragments in web_search snippets – but that is server-side behavior, not something the client code controls.
The Accept header prefers Markdown
// utils.ts#L279
headers: {
Accept: 'text/markdown, text/html, */*',
'User-Agent': getWebFetchUserAgent(),
}
text/markdown comes first. If your server serves Markdown (some documentation sites do), Claude gets it raw – no Turndown conversion, no structural loss. This is a genuine advantage for sites that support content negotiation.
My study observed that web_fetch returns content in Markdown format with preserved heading hierarchy. The code explains why: if your server responds with text/markdown, the content skips Turndown entirely – no conversion loss. For HTML responses, Turndown does the work. Sites that support content negotiation and serve Markdown get an edge.
Five layers between your content and Claude
Your blog post passes through five transformations before Claude reasons with it. Some are obvious. One is not.
Layer 1: Turndown (HTML to Markdown). Your HTML becomes Markdown via new Turndown() with zero configuration. Everything in <head> disappears: meta tags, JSON-LD, Open Graph. Images are dropped. Navigation and breadcrumbs survive as plain text and compete with your article content.
Layer 2: Truncation at 100,000 characters. If your Markdown exceeds 100K characters, it is sliced and a note is appended:
// utils.ts#L529-532
markdownContent.slice(0, MAX_MARKDOWN_LENGTH)
+ '\n\n[Content truncated due to length...]'
Everything after 100K is gone. Claude never knows it existed.
Layer 3: Haiku screening. For non-preapproved domains, a smaller model processes your content with copyright-aware instructions. The full set of rules Haiku receives:
- Enforce a strict 125-character maximum
for quotes from any source document.
Open Source Software is ok as long as
we respect the license.
- Use quotation marks for exact language
from articles; any language outside of
the quotation should never be
word-for-word the same.
- You are not a lawyer and never comment
on the legality of your own prompts
and responses.
- Never produce or reproduce exact
song lyrics.
This is not just summarization. Haiku is instructed to avoid reproducing your content verbatim. Your words are paraphrased by design. The only exception: preapproved documentation sites, where code examples and excerpts pass through.
Layer 4: Tool result budget (the hidden one). After Haiku returns its summary, that summary enters the tool result pipeline. Here, a second size limit applies:
// toolLimits.ts
DEFAULT_MAX_RESULT_SIZE_CHARS = 50_000
PREVIEW_SIZE_BYTES = 2000
MAX_TOOL_RESULTS_PER_MESSAGE_CHARS = 200_000
If the Haiku summary exceeds 50K characters, it is saved to disk and replaced with a 2,000-byte preview. Claude sees only the first ~2KB. If multiple tools run in parallel (search + fetch + fetch), their combined results are capped at 200K per message – the largest ones are persisted to disk first.
These thresholds are also remotely adjustable via GrowthBook – an open-source feature flag system that lets Anthropic change configuration values server-side without shipping a new release. The flag tengu_satin_quoll controls per-tool persistence thresholds. Anthropic can tighten or loosen how much of your content reaches Claude at any time.
Layer 5: The preapproved bypass. Exactly one path skips Haiku entirely:
// WebFetchTool.ts#L264-269
if (
isPreapproved &&
contentType.includes('text/markdown') &&
content.length < MAX_MARKDOWN_LENGTH
) {
result = content // raw, no Haiku
}
Preapproved domain + server responds with text/markdown + content under 100K = your content arrives at Claude unmodified. For every other site, it passes through all five layers.
The full pipeline with limits:
| Step | Limit | Approx. equivalent |
|---|---|---|
| HTTP fetch | 10 MB | Raw HTML |
| Turndown (HTML to Markdown) | no own limit | Nav text survives |
| Markdown truncation | 100,000 characters | ~20,000 words |
| Haiku processing | 100K chars in, 8K tokens out | ~6,000 words output |
| Tool result budget | 50,000 characters | 2KB preview if exceeded |
A typical 2,000-word blog post produces 5,000–8,000 characters of Markdown after Turndown – well within all limits. The bottleneck for normal content is not truncation but Haiku's output: regardless of input size, the summary is capped at ~6,000 words. For pages with heavy navigation HTML (mega-menus, footer link farms, sidebar widgets), Turndown preserves the text of those elements – they compete with your article content for Haiku's attention and the 100K character budget.
The cache changes everything about freshness
// utils.ts#L63-78
const CACHE_TTL_MS = 15 * 60 * 1000 // 15 minutes
const MAX_CACHE_SIZE_BYTES =
50 * 1024 * 1024 // 50MB
const URL_CACHE = new LRUCache({
maxSize: MAX_CACHE_SIZE_BYTES,
ttl: CACHE_TTL_MS,
})
const DOMAIN_CHECK_CACHE = new LRUCache({
max: 128,
ttl: 5 * 60 * 1000, // 5 min
})
When Claude fetches your page, the result is cached for 15 minutes. Every user asking about the same URL in that window gets the same Haiku summary. Not the same page – the same processed summary.
Your page's first impression is its only impression for 15 minutes. If Haiku misreads your content on the first pass, that misreading is what everyone gets. There is no retry.
The domain blocklist has a separate 5-minute cache. If Anthropic blocks your domain, it takes at most 5 minutes to take effect.
The domain blocklist is a kill switch
Before fetching any non-preapproved URL, Claude Code calls home:
// utils.ts#L183
const response = await axios.get(
`https://api.anthropic.com/api/web/
domain_info?domain=${domain}`,
{ timeout: 10_000 }
)
if (response.data.can_fetch === true) {
// proceed
}
Anthropic maintains a real-time domain blocklist. If can_fetch returns false, Claude cannot read your site. This is not a robots.txt check – it is Anthropic's own gate, checked on every first request to a new domain.
Enterprise customers can skip this check via skipWebFetchPreflight – but regular users cannot.
Three User-Agents, three identities
The code reveals Claude operates with three distinct User-Agent strings:
// http.ts#L52-57
// "Claude-User" is Anthropic's publicly
// documented agent for user-initiated
// fetches; the claude-code suffix lets them
// distinguish local CLI traffic from
// claude.ai server-side fetches.
export function getWebFetchUserAgent() {
return `Claude-User (${version};
+https://support.anthropic.com/)`
}
The comment is the finding: "distinguish local CLI traffic from claude.ai server-side fetches." This confirms that claude.ai (the chat product) uses a different fetcher – not this client-side pipeline. When your server sees Claude-User with a claude-code suffix, that is the CLI. The chat product has its own server-side mechanism.
Claude Code can pay for content
Buried in 40 lines of error handling is a protocol nobody has written about:
// utils.ts#L329-356
if (error.response?.status === 402) {
const { isX402Enabled,
handlePaymentRequired,
getX402SessionSpentUSD
} = require('../../services/x402/index.js')
const paymentHeader =
error.response.headers['x-payment-required']
if (isX402Enabled() && paymentHeader) {
const result = handlePaymentRequired(
paymentHeader,
getX402SessionSpentUSD(),
)
if (result) {
// Retry request with payment header
return await axios.get(url, {
headers: {
'x-payment': result.paymentHeader,
},
})
}
}
}
HTTP 402 Payment Required. The x402 protocol. Claude Code can detect paywalled content and pay for it using a session-scoped budget tracked via getX402SessionSpentUSD(). This is infrastructure for a web where AI agents pay per article. It exists in the code today.
Same web, two different views
The same search engine powers both products. But what reaches the model differs fundamentally.
claude.ai (chat)
- Search returns 10 results with encrypted snippets (~500 words each)
- Model reads snippets directly, cites from them
page_ageavailable as freshness signal ("6 days ago")- No separate fetch needed for most answers
- Snippet selection: keyword-proximity-based, 1–6 fragments, query-dependent
- Your content arrives as the search engine selected it
Claude Code (CLI)
- Same 10 results, but encrypted_content and page_age are discarded
- Only
{ title, url }kept - Content requires separate WebFetch call
- WebFetch pipeline: Turndown (HTML to Markdown) then Haiku with 125-character quote limit then 100K truncation then 50K tool result budget (2KB preview if exceeded)
- 107 preapproved documentation sites skip Haiku
- Your content arrives as Haiku's paraphrase of a Markdown conversion of your HTML
Both products share the same search index, the same ranking, the same keyword matching. The difference is what happens after the results arrive. claude.ai works with the server-provided snippets. Claude Code throws them away and rebuilds its own view of your content through a five-layer pipeline.
My black-box observations about snippet structure and fragment behavior describe the shared search engine. The code I am reading here shows a different, more lossy path to the same content.
Hypotheses about Anthropic's search engine
The source code reveals the edges of the black box without opening it. Combined with my empirical measurements, here is what I can infer:
The index stores full page text, not pre-computed snippets. My study proved this: the same URL returns different fragments for different queries. The code confirms that web_search returns only { title, url } – no snippet field. The snippets I observed in my study come from the model's commentary, not from pre-indexed fragments.
Ranking may use term-frequency scoring (BM25 or similar). My study found a correlation between BM25 proxy scores and ranking position across domains – but this is a small-sample observation, not proof. The code shows no semantic search parameters. The query is a plain string. No embeddings, no vector similarity. This is consistent with lexical retrieval, but the server-side engine could use any algorithm.
The ~500-word snippet budget is model-generated, not API-enforced. My study measured a consistent ~500-600 word budget per snippet with a hard maximum of 6 fragments. The code shows no word limit on search results – just { title, url }. The fragment structure, the ~94-word average per fragment, the keyword proximity effects – all of this is the model's own behavior when summarizing search results. The budget is a behavioral pattern, not a technical constraint.
The web-search-2025-03-05 version string suggests iteration. It is a beta header with a date stamp. Anthropic is actively developing this. The search behavior I measured in March 2026 may not match what exists in June 2026.
Or it is all an April Fools' joke
The repository appeared on April 1. The code is plausible, detailed, and internally consistent. It matches my black-box observations in ways that would be difficult to fake:
additionalProperties: falseon the search schema – matches my empirical observation- Turndown with zero custom rules – would explain why JSON-LD and alt-text are invisible
- Server-side search returning only
{ title, url }– consistent with why I never saw snippet text in the raw API response - The 125-character quote limit – consistent with the compression patterns I measured in fetch results
If it is a fake, it is a fake that independently arrived at the same architecture I reverse-engineered from the outside. I will treat it as real until proven otherwise.
Yesterday: 170 queries and careful measurement. Today: grep -r "makeSecondaryModelPrompt" and reading TypeScript.
The vibes say April Fools'. The code says architecture. I will update this article when Anthropic comments.
Appendix: the full preapproved domain whitelist
These 107 domains skip the domain blocklist check and receive relaxed content extraction rules (no 125-character quote limit, code examples preserved). Source: src/tools/WebFetchTool/preapproved.ts.
Anthropic: platform.claude.com, code.claude.com, modelcontextprotocol.io, github.com/anthropics, agentskills.io
Programming languages: docs.python.org, en.cppreference.com, docs.oracle.com, learn.microsoft.com, developer.mozilla.org, go.dev, pkg.go.dev, www.php.net, docs.swift.org, kotlinlang.org, ruby-doc.org, doc.rust-lang.org, www.typescriptlang.org
JavaScript/Web: react.dev, angular.io, vuejs.org, nextjs.org, expressjs.com, nodejs.org, bun.sh, jquery.com, getbootstrap.com, tailwindcss.com, d3js.org, threejs.org, redux.js.org, webpack.js.org, jestjs.io, reactrouter.com
Python libraries: docs.djangoproject.com, flask.palletsprojects.com, fastapi.tiangolo.com, pandas.pydata.org, numpy.org, www.tensorflow.org, pytorch.org, scikit-learn.org, matplotlib.org, requests.readthedocs.io, jupyter.org
PHP: laravel.com, symfony.com, wordpress.org
Java: docs.spring.io, hibernate.org, tomcat.apache.org, gradle.org, maven.apache.org
.NET: asp.net, dotnet.microsoft.com, nuget.org, blazor.net
Mobile: reactnative.dev, docs.flutter.dev, developer.apple.com, developer.android.com
ML/Data Science: keras.io, spark.apache.org, huggingface.co, www.kaggle.com
Databases: www.mongodb.com, redis.io, www.postgresql.org, dev.mysql.com, www.sqlite.org, graphql.org, prisma.io
Cloud/DevOps: docs.aws.amazon.com, cloud.google.com, kubernetes.io, www.docker.com, www.terraform.io, www.ansible.com, vercel.com/docs, docs.netlify.com, devcenter.heroku.com
Testing: cypress.io, selenium.dev
Game dev: docs.unity.com, docs.unrealengine.com
Other: git-scm.com, nginx.org, httpd.apache.org
If your documentation site is not on this list, Claude Code treats it as untrusted – blocklist check, 125-character quote limit, mandatory paraphrasing.
Related: Vibe Coding Ships Broken | Why Nobody Built a Content Build System