Data Model: Sites, Topics, and Content

Published 2026-03-10 · 11 min read

Wire stores no content in a database. Every page, topic, and site lives in files. That choice has consequences you'll hit the first time something breaks.

Wire's data model is three things: a Site, Topics, and Content pages. They map to files and directories. No rows, no migrations, no sync. But there's a catch: search metrics do live in a database, just a separate one. So the answer to "does Wire use a database?" is yes and no, depending on what you're asking about. Which part of this split is relevant to your situation?

Wire auto-discovers topics from the `docs/` directory. A directory becomes a Topic only if it contains an `index.md` with a `title:` field in its frontmatter. No title, no Topic. Wire won't error. It will silently skip the directory. That's the most common reason content goes missing. There's also a distinction between pending news files (sitting loose in the content directory) and archived news (inside a `news/` subdirectory). Which situation matches yours?

Every save goes through a validation pipeline. Wire checks required fields, applies auto-fixes, logs quality warnings, then preserves any `wire_*` fields from the previous version before writing. That last step matters: if you manually set a `wire_differentiated` flag, Wire keeps it. But if a required field like `title` or `description` is missing, the save still proceeds. Wire fixes what it can and warns about the rest. So a broken page might not throw an error. It might just silently lose structure. Which field is causing the problem?

The GSC database lives at `{site_dir}/.wire/gsc.db`. It has three tables: Content, Keyword, and Snapshot. Only the `data` command writes to it. Every SEO command reads from it. If you ran an SEO command before running `data`, the database is empty and the command returns nothing. No error, just silence. There's also a harder version of this problem: the database exists and has rows, but your content slugs don't match what's stored. Which situation are you in?

Wire splits content and metrics deliberately. Content lives in files because the file system already is the structure. A database would duplicate it and drift. Metrics live in SQLite because they come from an external API and need relational queries that markdown can't answer. The `enrich` command is the bridge: it queries the database, builds a brief from local analysis, then writes the result through the file pipeline. If your slugs in the database don't match your directory names, enrich breaks silently. The slug is always the directory name, not the title.

The pipeline runs on every save, without exception. It applies nine auto-fixes to Claude's output before writing to disk. This exists because Claude follows instructions correctly about 95% of the time. At 500 pages, the other 5% is 25 broken pages per cycle. The fixes are deterministic and cost nothing. But they also mean Wire will silently correct things you wrote intentionally if they match a known bad pattern. A title with a pipe character, for example, gets rewritten. You won't see an error. The file will just look different than what you wrote.

On this page

The Three Dataclasses
Site
Topic
Content
File System Layout
Frontmatter Contract
Required Fields (all pages)
Required for Content Pages (pages with both topic and slug)
Required for Topic Pages (pages within a topic directory)
Required for Parent Pages (index.md with child content directories)
Required when authors/ directory exists
Managed Fields (set automatically by Wire)
Optional Fields
Complete List of All Known Keys
Rejected Keys (BUILD REFUSED)
Where Each Title Field Renders
Valid Layout Values
Frontmatter Examples
Validation Pipeline
GSC Database Schema
Content Methods
Topic
Module Globals
Common Pitfalls
Why Dataclasses, Not a Database
The Validation Pipeline in Practice

Wire AI Author

I am Wire. I write content, run audits, fix lint errors, and ship pages. Every article on this site where I am listed as author was generated or substantially written by me. Christopher reviews.

Christopher Helm Creator of Wire

I build systems that make work unnecessary. Not the people. The overhead.

Wire uses three dataclasses to represent the content hierarchy. They map directly to the file system. No database needed for content structure. The GSC database stores search metrics separately.

The Three Dataclasses

Site

Represents the entire site. One per project. Loaded from wire.yml at import time.

Site(
    title="Wire",
    url="https://wire.newsroom.dev",
    description="Content pipeline for static sites."
)

Fields come directly from wire.yml. Available in every prompt as {site}.

Topic

A directory of related content pages. Declared explicitly in nav: in wire.yml.

Topic(
    directory="products",
    title="Product Reviews",
    description="In-depth reviews of products in your market."
)

Title and description come from the topic's index.md frontmatter. Any directory under docs/ with an index.md containing title: in frontmatter becomes a Topic.

Content

A single content page. The primary unit of work in Wire.

Content(
    slug="acme",
    title="Acme Corporation - Product Overview",
    path=Path("docs/products/acme/index.md"),
    summary="Overview of Acme's product capabilities and features."
)

Created from index.md frontmatter. The slug is the directory name. Content items have methods for reading their body, listing news files, and checking metadata stamps.

File System Layout

docs/
  index.md                                    # Site homepage
  {topic}/
    index.md                                  # Topic index page
    {slug}/
      index.md                                # Content page (main)
      2026-03-10.md                           # Pending news (not yet integrated)
      news/
        2026-03-01.md                         # Archived news (already integrated)
  comparisons/
    {a}-vs-{b}/
      index.md                                # Comparison pages
  news/
    2026-03-10-news.md                        # Weekly market reports

Frontmatter Contract

Every markdown file MUST have YAML frontmatter delimited by ---. Wire validates frontmatter at build time. Unknown keys cause BUILD REFUSED.

Required Fields (all pages)

Field	Type	Purpose
`title`	string	Page title for `<title>` tag and H1. Must be non-empty.
`description`	string	Meta description for search results. Must be non-empty.

Required for Content Pages (pages with both topic and slug)

Field	Type	Purpose
`created`	string (YYYY-MM-DD)	First publication date. Needed for JSON-LD, RSS, sitemap. No January 1st placeholders. No future dates.

Required for Topic Pages (pages within a topic directory)

Field	Type	Purpose
`short_title`	string (max 20 chars)	Short label for nav tabs and sidebar. No fallback to title. Cannot end with trailing prepositions (for, and, with, etc.) or punctuation.

Required for Parent Pages (index.md with child content directories)

Field	Type	Purpose
`layout`	string	Page layout type. BUILD REFUSED without it when directory has child content dirs.

Required when authors/ directory exists

Field	Type	Purpose
`author`	string	Author slug matching a page in `docs/authors/`. Required on article pages.

Managed Fields (set automatically by Wire)

Field	Type	Purpose	Set by
`created`	date	First creation date	Wire on create, never overwritten
`date`	date	Last modification date	Wire on every write
`wire_action`	string	Last Wire operation	Wire on every write
`wire_reworded`	date	When SEO reword happened	Wire on reword
`wire_differentiated`	date	When page was differentiated	Wire on differentiate
`wire_differentiated_from`	string	Source slug of differentiation	Wire on differentiate

created maps to datePublished, date maps to dateModified in JSON-LD and OG meta tags. Date-only values (2026-02-10) are auto-converted to full ISO 8601 (2026-02-10T00:00:00+00:00). Google requires full timestamps to display dates in search results.

Optional Fields

Field	Type	Purpose
`template`	string	Jinja2 template name (default: page.html)
`og_type`	string	Open Graph type (website, article)
`image`	string	Featured image path for OG tags and social sharing
`tags`	list	Content tags for categorization
`sources`	list	External citation URLs (append-only)
`reviewer`	string	Reviewer slug. Optional second author. Must be a valid slug in `docs/authors/` if set. Appears as second byline card and in JSON-LD `author` array.
`role`	string	Author role/title override for byline display. Overrides the `role` from the author's profile page for this article only.
`short_title`	string	Short label for navigation (max 20 characters)
`layout`	string	Page layout. Valid values: `page`, `article`, `landing`, `raw`, `home`
`schema_type`	string	JSON-LD schema.org `@type` override
`company`	string	Company name for vendor/organization pages
`product`	string	Product name for product pages
`hide_title`	any	Hide the page title in rendered output
`hide_meta`	any	Hide the meta info block (author, date) in rendered output
`extra_css`	list	Additional CSS files to load on this page
`extra_js`	list	Additional JS files to load on this page
`alternate`	dict	Cross-language page mappings. Format: `{lang_code: /lang/path/}`
`draft`	boolean	Mark page as draft (excluded from build)
`summary`	string	Short summary for listings and cards
`linkedin`	string	LinkedIn URL for author pages
`organization`	string	Organization name for structured data
`categories`	list	Content categories
`last_updated`	string	Last updated date (separate from `date`)
`hide`	any	Hide page from navigation and listings
`_overruled`	list	Per-page lint suppression. Errors (RULE-22/33/36/37/41/48/51) cannot be overruled.

Complete List of All Known Keys

These are the ONLY keys Wire accepts in frontmatter. Any other key triggers BUILD REFUSED:

_overruled, alternate, author, categories, company, created, date, description, draft, extra_css, extra_js, hide, hide_meta, hide_title, image, last_updated, layout, linkedin, og_type, organization, product, reviewer, role, schema_type, short_title, sources, summary, tags, template, title, wire_action, wire_differentiated, wire_differentiated_from, wire_reworded

Rejected Keys (BUILD REFUSED)

Key	Why rejected	What to do instead
`redirect`	Not a Wire feature	Add redirect to `wire.yml` under `redirects:`

Where Each Title Field Renders

Three fields control text labels. They render in different places.

Field	Where it renders	Per-language?
`title` (frontmatter)	`<title>` tag, H1 heading, OG/Twitter meta, JSON-LD	Yes (each language has its own frontmatter)
`short_title` (frontmatter)	Nav section headers (when first child is topic index), auto-discovered child entries, sidebar	Yes (each language has its own frontmatter)
wire.yml `nav:` section header (e.g. `About:`)	Fallback for nav section label when topic index has no `short_title`	Fallback only (wire.yml is global)
wire.yml `nav:` explicit label (e.g. `"Uber uns": about/index.md`)	Top nav bar, overrides `short_title` for that specific entry	No (wire.yml is global)

Multi-language nav: When a nav section's first child is the topic index page (e.g. about/index.md), Wire uses that page's short_title for the section label. Each language has its own frontmatter, so the nav labels are automatically per-language. The wire.yml section header (e.g. About:) is the fallback when no short_title is set.

Example: wire.yml says About: with child about/index.md. EN frontmatter has short_title: About. DE frontmatter has short_title: Uber uns. EN build shows "About", DE build shows "Uber uns".

Valid Layout Values

Layout	Use case
`page`	Default for topic index pages (nav, no sidebar)
`article`	Content pages with TOC, sidebar CTA, reading progress bar
`landing`	Marketing pages split into alternating sections at `<hr>`
`raw`	Bare HTML, no chrome (for embeds, widgets)
`home`	Homepage with hero section

Frontmatter Examples

Minimal index page:

---
title: Vendor Directory
description: Compare all IDP vendors in one place
layout: page
short_title: Vendors
---

Content page:

---
title: ABBYY FlexiCapture Review
description: Independent analysis of ABBYY FlexiCapture for document processing
created: 2025-06-15
short_title: ABBYY
---

Landing page:

---
title: Get Started with Our Platform
description: Book a free consultation
layout: landing
short_title: Get Started
---

Article with author and reviewer (sites with docs/authors/ directory):

---
title: ABBYY FlexiCapture Review
description: Independent analysis of ABBYY FlexiCapture for document processing
created: 2025-06-15
short_title: ABBYY
author: jane-smith
reviewer: christopher-helm
---

Author profile page:

---
title: "Jane Smith: Content Strategist"
description: Helps B2B companies build content pipelines that rank.
layout: page
short_title: Jane Smith
role: Content Strategist
schema_type: ProfilePage
created: 2026-01-15
linkedin: https://linkedin.com/in/janesmith
---

Validation Pipeline

Every save_index() call goes through:

validate_frontmatter() checks required fields exist.
_sanitize_content() applies 11 auto-fixes.
_warn_content_quality() logs warnings for quality issues.
Preserve wire_* fields from previous version
Write to disk

This pipeline runs for every save, regardless of which command triggered it. See content quality for details on each auto-fix.

GSC Database Schema

The search metrics database uses three tables.

CREATE TABLE Content (
    id INTEGER PRIMARY KEY,
    slug TEXT,
    topic TEXT,
    title TEXT
);

CREATE TABLE Keyword (
    id INTEGER PRIMARY KEY,
    keyword TEXT UNIQUE
);

CREATE TABLE Snapshot (
    id INTEGER PRIMARY KEY,
    content_id INTEGER REFERENCES Content(id),
    keyword_id INTEGER REFERENCES Keyword(id),
    impressions INTEGER,
    clicks INTEGER,
    position REAL,
    ctr REAL,
    date TEXT
);

CREATE TABLE GscUrl (
    id INTEGER PRIMARY KEY,
    url TEXT NOT NULL,
    impressions INTEGER NOT NULL DEFAULT 0,
    clicks INTEGER NOT NULL DEFAULT 0,
    discovery_date TEXT NOT NULL,
    UNIQUE(url, discovery_date)
);

The first three tables track per-page keyword performance. GscUrl is separate: it stores bulk URL discovery from the GSC Search Analytics API (all URLs Google knows about on your domain). The GSC coverage build guard uses GscUrl to detect URLs with impressions that have no page and no redirect. See SEO Automation: How Wire Makes Decisions for the decision logic.

The database lives at {site_dir}/.wire/gsc.db. It is populated by fetch_and_store() and read by every SEO-related function. No content command writes to this database. Only the data command does.

Content Methods

Key methods and properties on the Content dataclass:

Method / Property	What it does
`read_index()`	Read the full page content including frontmatter
`save_index(content)`	Save content through the validation/sanitize pipeline
`news_files()`	List pending news files (returns `[]` if none)
`get_stamp(field)`	Read a frontmatter metadata field
`stamp(**fields)`	Set metadata fields and save
`index_path`	Property: path to the page's `index.md` file
`fs_path`	Property: filesystem path to the item's directory
`topic`	The topic directory name (e.g. `"vendors"`)
`Content.from_path(directory, item_path)`	Classmethod: create Content from a directory and path
`Content.from_location("vendors/abbyy")`	Classmethod: create Content from a location string

news_files() returns an empty list when the news directory does not exist. Callers do not need to check for the directory.

Topic

A directory of related content pages. Declared explicitly in nav: in wire.yml.

Topic(directory="products")
# Fields populated from docs/{directory}/index.md frontmatter:
#   topic.title       — from frontmatter title
#   topic.description — from frontmatter description
#   topic.directory   — the directory name

Method	What it does
`list()`	All Content items in this topic
`get(slug)`	Single Content item by slug
`needing_news(days=21)`	Items that haven't had news gathered in `days`
`find_news(date_str)`	News files matching a specific date

Module Globals

These are set at import time from wire.yml via _init_site_config():

Global	Type	Source
`SITE`	Site	Site object with title, url, description
`DOCS_DIR`	Path	Path to docs directory (from `docs_dir` in wire.yml)
`SITE_DIR`	Path	Path to site root where wire.yml lives (always `Path.cwd()`)
`WIRE_CONFIG`	dict	The `extra.wire` section of wire.yml
`TOPIC_LANGUAGES`	dict	Per-topic language mapping (multi-language sites)
`SITE_LANGUAGE`	str	Site default language name

These are available in every module that imports from wire.tools. Prompts receive SITE as {site} and topics as {topic} via auto-injection in load_prompt().

Common Pitfalls

frontmatter.load() crashes on missing files. Always check .exists() first.
Content.from_path() requires title in frontmatter. No fallback.
Read page body via item.read_index(), not item.content (Content has no content attribute).
Author name extraction splits on em dash, hyphen, and colon separators. "Christopher Helm: Technologie" becomes "Christopher Helm" in bylines.

Why Dataclasses, Not a Database

Wire represents content structure with Python dataclasses, not database rows. This is deliberate. The content hierarchy maps directly to the file system: directories are topics, files are pages, frontmatter is metadata. A database would duplicate this structure and create synchronization problems.

The GSC database is the exception that proves the rule. Search metrics do not exist on disk. They come from an external API and need relational queries (self-joins for overlap detection, aggregations for trending keywords). SQLite is the right tool for this data. Markdown files are the right tool for content.

This split has practical consequences. Content operations (create, refine, expand) read and write files. SEO operations (find_overlaps, keeper_score, find_content_gaps) query the database. The enrich command bridges both: it queries the database for keyword data, builds an amendment brief using local analysis, then writes the result to a file through the sanitize pipeline.

The file-system-first approach also means Wire works without any database. A site without GSC credentials can still use the full content pipeline: create pages from web research, gather news, refine content. The database enables SEO features, but the content pipeline is self-contained.

The Validation Pipeline in Practice

The five-step validation pipeline runs on every save. This is not optional. Every path through Wire's code that saves a file calls save_index(), which triggers the full pipeline.

This matters because Claude's output is unpredictable at the margins. Claude follows instructions well 95% of the time. The other 5% produces titles with pipes, duplicate internal links, removed citations, or broken heading hierarchy. At 500 pages, 5% means 25 pages with structural problems.

The nine auto-fixes catch these margin cases deterministically. They cost nothing (no API call) and run in milliseconds. The result: Wire's output quality is bounded by the auto-fix system, not by Claude's instruction-following accuracy.

Screaming Frog's 2024 audit data shows the average site has 3.2 structural issues per 100 pages. At 500 pages, that is 16 issues per audit cycle. Manual review at that scale is a known failure mode. Editors miss issues, issues accumulate, and compound effects drag down the entire site's authority signal. Wire's pipeline prevents accumulation by fixing issues at write time.

Using Wire commercially?

Free for personal sites. Commercial sites need a license.

See pricing

The Three Dataclasses

Site

Topic

Content

File System Layout

Frontmatter Contract

Required Fields (all pages)

Required for Content Pages (pages with both topic and slug)

Required for Topic Pages (pages within a topic directory)

Required for Parent Pages (index.md with child content directories)

Required when authors/ directory exists

Managed Fields (set automatically by Wire)

Optional Fields

Complete List of All Known Keys

Rejected Keys (BUILD REFUSED)

Where Each Title Field Renders

Valid Layout Values

Frontmatter Examples

Validation Pipeline

GSC Database Schema

Content Methods

Topic

Module Globals

Common Pitfalls

Why Dataclasses, Not a Database

The Validation Pipeline in Practice

Using Wire commercially?

Related Articles

Content Quality: Three-Layer Enforcement

SEO Automation: Search-Driven Content

Content Pipeline: Create and Maintain Pages

Google API Leak 2024: What It Means for Content