Data Model - Sites, Topics, and Content

Wire stores no content in a database. Every page, topic, and site lives in files. That choice has consequences you'll hit the first time something breaks.

Wire's data model is three things: a Site, Topics, and Content pages. They map to files and directories. No rows, no migrations, no sync. But there's a catch: search metrics do live in a database, just a separate one. So the answer to "does Wire use a database?" is yes and no, depending on what you're asking about. Which part of this split is relevant to your situation?

Wire auto-discovers topics from the `docs/` directory. A directory becomes a Topic only if it contains an `index.md` with a `title:` field in its frontmatter. No title, no Topic. Wire won't error. It will silently skip the directory. That's the most common reason content goes missing. There's also a distinction between pending news files (sitting loose in the content directory) and archived news (inside a `news/` subdirectory). Which situation matches yours?

Every save goes through a validation pipeline. Wire checks required fields, applies auto-fixes, logs quality warnings, then preserves any `wire_*` fields from the previous version before writing. That last step matters: if you manually set a `wire_differentiated` flag, Wire keeps it. But if a required field like `title` or `description` is missing, the save still proceeds. Wire fixes what it can and warns about the rest. So a broken page might not throw an error. It might just silently lose structure. Which field is causing the problem?

The GSC database lives at `{site_dir}/.wire/gsc.db`. It has three tables: Content, Keyword, and Snapshot. Only the `data` command writes to it. Every SEO command reads from it. If you ran an SEO command before running `data`, the database is empty and the command returns nothing. No error, just silence. There's also a harder version of this problem: the database exists and has rows, but your content slugs don't match what's stored. Which situation are you in?

Wire splits content and metrics deliberately. Content lives in files because the file system already is the structure. A database would duplicate it and drift. Metrics live in SQLite because they come from an external API and need relational queries that markdown can't answer. The `enrich` command is the bridge: it queries the database, builds a brief from local analysis, then writes the result through the file pipeline. If your slugs in the database don't match your directory names, enrich breaks silently. The slug is always the directory name, not the title.

The pipeline runs on every save, without exception. It applies nine auto-fixes to Claude's output before writing to disk. This exists because Claude follows instructions correctly about 95% of the time. At 500 pages, the other 5% is 25 broken pages per cycle. The fixes are deterministic and cost nothing. But they also mean Wire will silently correct things you wrote intentionally if they match a known bad pattern. A title with a pipe character, for example, gets rewritten. You won't see an error. The file will just look different than what you wrote.

Wire uses three dataclasses to represent the content hierarchy. They map directly to the file system. No database needed for content structure. The GSC database stores search metrics separately.

The Three Dataclasses

Site

Represents the entire site. One per project. Loaded from wire.yml at import time.

Site(
    title="Wire",
    url="https://wire.newsroom.dev",
    description="Content pipeline for static sites."
)

Fields come directly from wire.yml. Available in every prompt as {site}.

Topic

A directory of related content pages. Auto-discovered from the docs directory.

Topic(
    directory="products",
    title="Product Reviews",
    description="In-depth reviews of products in your market."
)

Title and description come from the topic's index.md frontmatter. Any directory under docs/ with an index.md containing title: in frontmatter becomes a Topic.

Content

A single content page. The primary unit of work in Wire.

Content(
    slug="acme",
    title="Acme Corporation - Product Overview",
    path=Path("docs/products/acme/index.md"),
    summary="Overview of Acme's product capabilities and features."
)

Created from index.md frontmatter. The slug is the directory name. Content items have methods for reading their body, listing news files, and checking metadata stamps.

File System Layout

docs/
  index.md                                    # Site homepage
  {topic}/
    index.md                                  # Topic index page
    {slug}/
      index.md                                # Content page (main)
      2026-03-10.md                           # Pending news (not yet integrated)
      news/
        2026-03-01.md                         # Archived news (already integrated)
  comparisons/
    {a}-vs-{b}/
      index.md                                # Comparison pages
  news/
    2026-03-10-news.md                        # Weekly market reports

Frontmatter Contract

Every content page must have YAML frontmatter with required fields.

Required Fields

Field	Type	Purpose	Set by
`title`	string	Page title, H1, JSON-LD, nav	Wire on create
`description`	string	Meta description, JSON-LD	Wire on create/refine

Managed Fields

Field	Type	Purpose	Set by
`created`	date	First creation date	Wire on create, never overwritten
`date`	date	Last modification date	Wire on every write
`wire_action`	string	Last Wire operation	Wire on every write
`wire_reworded`	date	When SEO reword happened	Wire on reword
`wire_differentiated`	bool	Whether page was differentiated	Wire on differentiate
`wire_differentiated_from`	string	Source of differentiation	Wire on differentiate

Validation Pipeline

Every save_index() call goes through:

validate_frontmatter() checks required fields exist.
_sanitize_content() applies 9 auto-fixes.
_warn_content_quality() logs warnings for quality issues.
Preserve wire_* fields from previous version
Write to disk

This pipeline runs for every save, regardless of which command triggered it. See content quality for details on each auto-fix.

GSC Database Schema

The search metrics database uses three tables.

CREATE TABLE Content (
    id INTEGER PRIMARY KEY,
    slug TEXT,
    topic TEXT,
    title TEXT
);

CREATE TABLE Keyword (
    id INTEGER PRIMARY KEY,
    keyword TEXT UNIQUE
);

CREATE TABLE Snapshot (
    id INTEGER PRIMARY KEY,
    content_id INTEGER REFERENCES Content(id),
    keyword_id INTEGER REFERENCES Keyword(id),
    impressions INTEGER,
    clicks INTEGER,
    position REAL,
    ctr REAL,
    date TEXT
);

The database lives at {site_dir}/.wire/gsc.db. It is populated by fetch_and_store() and read by every SEO-related function. No content command writes to this database. Only the data command does.

Content Methods

Key methods on the Content dataclass:

Method	What it does
`read_index()`	Read the full page content including frontmatter
`news_files()`	List pending news files (returns `[]` if none)
`get_stamp(field)`	Read a frontmatter metadata field
`stamp(**fields)`	Set metadata fields and save

These methods are used throughout Wire's pipeline. news_files() returns an empty list when the news directory does not exist. Callers do not need to check for the directory.

Why Dataclasses, Not a Database

Wire represents content structure with Python dataclasses, not database rows. This is deliberate. The content hierarchy maps directly to the file system: directories are topics, files are pages, frontmatter is metadata. A database would duplicate this structure and create synchronization problems.

The GSC database is the exception that proves the rule. Search metrics do not exist on disk. They come from an external API and need relational queries (self-joins for overlap detection, aggregations for trending keywords). SQLite is the right tool for this data. Markdown files are the right tool for content.

This split has practical consequences. Content operations (create, refine, expand) read and write files. SEO operations (find_overlaps, keeper_score, find_content_gaps) query the database. The enrich command bridges both: it queries the database for keyword data, builds an amendment brief using local analysis, then writes the result to a file through the sanitize pipeline.

The file-system-first approach also means Wire works without any database. A site without GSC credentials can still use the full content pipeline: create pages from web research, gather news, refine content. The database enables SEO features, but the content pipeline is self-contained.

The Validation Pipeline in Practice

The five-step validation pipeline runs on every save. This is not optional. Every path through Wire's code that saves a file calls save_index(), which triggers the full pipeline.

This matters because Claude's output is unpredictable at the margins. Claude follows instructions well 95% of the time. The other 5% produces titles with pipes, duplicate internal links, removed citations, or broken heading hierarchy. At 500 pages, 5% means 25 pages with structural problems.

The nine auto-fixes catch these margin cases deterministically. They cost nothing (no API call) and run in milliseconds. The result: Wire's output quality is bounded by the auto-fix system, not by Claude's instruction-following accuracy.

Screaming Frog's 2024 audit data shows the average site has 3.2 structural issues per 100 pages. At 500 pages, that is 16 issues per audit cycle. Manual review at that scale is a known failure mode. Editors miss issues, issues accumulate, and compound effects drag down the entire site's authority signal. Wire's pipeline prevents accumulation by fixing issues at write time.