On this page
Three weeks ago I wrote that GEO is SEO with extra filters. The Claude Code source showed keyword matching, an HTML-to-Markdown converter, and a small model paraphrasing your content at 125 characters. At the search retrieval layer, that analysis holds.
It covers one of two layers. The second layer follows different rules. A Russian-language analysis published on Habr on April 20 makes those rules explicit. LLMs do not learn what things are. They learn what things are not.
Two Layers of GEO
GEO splits into two problems with different solutions.
The search layer (RAG): Systems like Perplexity, Bing Copilot, and Claude's web search retrieve pages and pass them to the model as context. The model extracts citable passages. This behaves like SEO. The Princeton GEO paper (arXiv 2311.09735) tested nine optimization strategies against 10,000 queries. Statistics with citations: +30–40% citation visibility. Quotable, self-contained paragraphs: +20–30%. Keyword stuffing: no measurable lift.
The training layer (weights): Content that ends up in training datasets shapes how models form conceptual categories. You are not writing for a retrieval query. You are writing for a gradient that updates billions of parameters. The rules differ fundamentally.
The Apophatic Machine
The Habr analysis makes a claim about how neural networks represent concepts: they are apophatic. They learn through negation.
When a model encounters "apple" in training data, it does not build a positive attribute list — round, red, fruit, sweet. It builds a boundary. Apple is not pear. Not tomato. Not ball. The "apple" embedding is the region of latent space that is not-pear and not-tomato and not-ball simultaneously. Remove the boundaries and the concept dissolves into noise.
This is not a metaphor. Contrastive learning — one of the primary training techniques used in modern embedding models — explicitly trains by pushing similar examples together and dissimilar examples apart in vector space. The model learns the shape of a boundary, not a list of properties.
The implication for brand and category content: vague positive attributes produce weak signal. "Quality products for active people who value freedom" contains words with millions of occurrences across training data and near-zero distinctive position in embedding space. The model cannot locate a stable boundary.
Contrast with: "We build software only for small businesses. We do not build enterprise features. The product works without an onboarding call." Three hard negatives. A boundary with three defined edges. The model can locate it.
Four ML Patterns Applied to Content Writing
The Habr analysis maps four machine learning training techniques to content strategy. These are the actual mechanisms, not loose analogies.
Hard Negatives: In ML training, a hard negative is a wrong answer that looks plausible but is clearly distinguishable from the correct one. Maximum gradient signal. For content: define what your product does not do alongside what it does. "Wire solves keyword cannibalization. Unlike Ahrefs, which detects it, Wire fixes it automatically by merging the duplicate pages." The boundary is explicit.
Contrastive Learning: Models learn that objects of the same class are close in embedding space and objects of different classes are far. For content: define the exact class of problem you solve and the exact class you explicitly do not. "For 200-page content operations requiring quality enforcement: Wire. For single-page landing pages: not Wire."
Curriculum Learning: Training builds from obvious contrasts to subtle ones — easy examples first, hard examples later. For content: lead readers from "X is better than clearly bad alternative" to "X is better than comparable alternative." The final boundary lands with more weight because you built up to it.
Triplet Loss: Training on anchor-positive-negative triples, where the anchor is closer to the positive than to the negative. For content: "When you need automated content quality enforcement across 200 pages (anchor), use Wire (positive) — not a manually-configured linter that has no cross-build memory or GSC integration (negative)."
High-surprise content gets high gradient updates. Vague positive claims produce low loss: the model already predicted something generic, and nothing changes. Clear contrastive boundaries produce high loss — the model predicted something wrong and the correction is precise. High loss means high weight update, which means stronger encoding.
Practical Implications for Wire Sites
At the search layer, Wire's build gates already handle most of the work. Quality enforcement passes content that is structurally sound: minimum word counts, no thin pages, no duplicate titles, correct heading hierarchy. Content that clears Wire's 90+ build rules is above the quality threshold where RAG systems lose signal.
For RAG citation specifically: write in citable units. One paragraph, one complete answer. Lead with the answer before any qualification. Embed statistics with named primary sources. Specific named entities ("software engineers at 10-50 person SaaS companies") beat generic nouns ("professionals").
At the training layer, most Wire operators are not currently writing for it. The content is defensible. It passes quality gates. It answers queries. It does not establish category boundaries.
Every core product or category page benefits from one contrastive statement: what the product solves, what it explicitly does not solve, and what the alternative is that only partially solves it. Not for Google's ranking algorithm. For the gradient.
Each core page contains at least one hard negative — what the product is NOT.
Category descriptions use function definitions, not attribute lists. At least one comparison uses triplet structure: anchor task, your solution, competing approach that falls short. Brand definitions work as vector boundaries, not adjective collections.
Who This Matters For
For small businesses focused on one service or product: the training layer is not the primary target. You are unlikely to generate enough training data exposure to shift model weights. The correct goal is search-layer dominance in your microniche — being the most relevant RAG result for the specific queries that matter. Standard SEO automation rules apply.
The training layer matters for category creators: people naming a new approach, publishing original frameworks, coining terminology others will adopt. If you define a term the industry starts using, third-party citations follow. Those citations enter training data. The term becomes a structural anchor in the model's category space.
Wire positions itself as a "Content Build System" — not an SEO tool, not a content platform, not an AI writing assistant. That definition is a boundary. Positive attributes ("comprehensive", "powerful", "AI-powered") dissolve in training data. A function definition that specifies what a thing is not does not dissolve.
The same principle applies to content Wire builds. The audit command finds thin pages and structural problems. Quality gates enforce baseline signal. Boundary engineering is a writing decision, not a lint rule.