Build Verification - 44 Automated Checks on Every Deploy

Your site built without errors. But broken internal links, missing canonical tags, and invalid JSON-LD are invisible until something goes wrong in search.

Wire runs 44 checks on the finished HTML after every build. Not on your markdown. On the rendered output that search engines actually crawl. This matters because a template can silently drop a canonical tag or forget the viewport meta, and your content layer never sees it. The checks are free, take under 2 seconds on a 200-page site, and every failure tells you the exact URL and what to fix. What brought you here?

The 44 rules fall into 11 categories: page structure, canonicalization, URL format, sitemap integrity, crawl access, internal links, security, images, multilingual, structured data, and indexability. Each rule exists because independent research shows it affects search performance or site health. Here's the part that surprises most operators: two of the most damaging issues are also the quietest. A noindex tag left over from staging on a page that other pages link to. A JSON-LD block with a trailing comma that silently invalidates all your structured data. Which of these matches your situation?

Build verification is the fourth layer in Wire's quality system. The first three layers work on markdown before and during writing. This layer works on the finished HTML after rendering. That sequence matters: a page can pass every content-level check and still fail verification because the template introduced the problem, not the content. A Jinja2 partial that omits the viewport meta tag. A canonical tag the template forgot to insert. The markdown was fine. The rendered page was not. But there's a separate question worth considering: some checks require a live server and Wire deliberately skips them at build time.

Six checks are excluded from build verification on purpose. Redirect chains, HTTP status codes, external link validation, SSL certificate validity, page load speed, and mobile rendering all require a live server or a browser engine. Wire works with files on disk. It cannot follow an HTTP redirect or fetch an external URL. This is a deliberate boundary, not a gap. For those checks, Google Search Console's URL Inspection tool or a monitoring service handles what Wire cannot. The question is whether the 44 checks Wire does run are covering the failures that actually reached your site.

On a real 316-page production site, the first verification run found 12 internal links pointing to renamed pages, 4 duplicate H1 tags across different sections, 2 pages where the template dropped Open Graph tags entirely, and 1 JSON-LD block with a trailing comma. Total time: 1.8 seconds. Every failure included the exact URL and a specific fix. The same audit done manually would have taken hours, and the JSON-LD issue almost certainly would have been missed. The site had been live for months. None of these were caught by the content layer.

Not every check applies to every site. A single-language site skips the three hreflang rules automatically. A site without JSON-LD has no structured data to validate. You can pass a comma-separated list of rule IDs to run only what applies, and every rule ID is stable across versions, so you can save a preferred set and run the same checks on every build. The tradeoff: a rule you skip is a failure you won't see. Two of the most common production issues, broken internal links and noindex left on linked pages, are easy to exclude by accident if you're trimming aggressively.

Wire does not trust what it builds. After rendering every markdown page to HTML, it runs 44 automated checks across the finished site. Every check produces a specific, actionable message. Pages that fail do not ship silently. You see exactly what broke and where.

This is the final layer of Wire's quality system. The content quality layers work on markdown before and during writing. Build verification works on the finished HTML after rendering. It catches problems that only become visible in the final output: broken links between rendered pages, missing meta tags that templates should have inserted, images without dimensions, structured data with invalid JSON.

How It Works

After wire.build renders your site to HTML, run the verification:

python -m wire.lint --site /path/to/site

Wire parses every HTML file, builds a cross-page index (for duplicate detection and orphan checks), and runs all 44 rules. The output lists every failure with the rule ID, the affected URL, and what to fix.

You can also run specific checks:

python -m wire.lint --site /path/to/site --rules RULE-01,RULE-05,RULE-33

On a typical 200-page site, the full scan takes under 2 seconds. Zero API calls, zero cost.

What Wire Checks

The 44 rules fall into 11 categories. Each rule exists because independent research, not Google's public statements, shows it affects search performance or site health.

Page Structure (11 rules)

These checks verify that every page has the HTML elements search engines need to understand and rank it.

Rule	What it catches	Why it matters
RULE-01	Missing or empty H1	SearchPilot A/B test: pages with keyword-aligned H1 saw 28% more traffic. The H1 tells Google the page's primary topic
RULE-02	Multiple H1 tags on one page	The 2024 Google API leak confirmed entity extraction uses heading structure. Multiple H1s dilute the primary topic signal
RULE-03	Skipped heading levels (H1 then H3, no H2)	Heading hierarchy feeds semantic structure signals. Skipped levels break the logical outline Google uses for entity extraction
RULE-04	Target search query missing from H1 or title	Zyppy's 81K title study: Google rewrites titles 61.6% of the time when they do not match the page's primary heading
RULE-05	Title tag missing, empty, too short (<20), too long (>65), or duplicate	Backlinko's analysis of 11.8M search results: title tag optimization correlates with higher rankings. Google truncates titles over ~60 characters in search results
RULE-06	Multiple `<title>` tags	Browsers use the first one. Search engines may use either. Remove the duplicate
RULE-07	Meta description missing or over 158 characters	Google truncates long descriptions in search results. A missing description means Google writes one for you, and it rarely matches your intent
RULE-08	Thin content (fewer than 200 words)	Multiple case studies show thin pages hurt when they cannibalize stronger siblings. 201Creative deleted thin ecommerce pages and saw +867% traffic. This rule flags stubs that need expanding
RULE-09	Missing Open Graph tags (og:title, og:description)	These control how your pages appear when shared on LinkedIn, Twitter, Slack, and other platforms. Missing tags mean the platform guesses, usually poorly
RULE-10	Missing viewport meta tag	Without it, mobile browsers render pages at desktop width and scale down. Google's mobile-first indexing penalizes pages that are not mobile-friendly
RULE-11	No character encoding in first 1024 bytes	Browsers need to know the encoding before parsing. Missing charset can cause garbled text, especially for non-English content

Canonicalization (4 rules)

Canonical tags tell search engines which version of a page is the "real" one. Getting this wrong splits your ranking signals across multiple URLs.

Rule	What it catches	Why it matters
RULE-12	Missing canonical tag	Without a canonical tag, Google decides which URL to index. If your page is accessible at multiple URLs (with/without trailing slash, with/without www), Google may split your ranking signals
RULE-13	Multiple canonical tags	When Google finds two canonical tags pointing to different URLs, it ignores both. Your page has no canonical signal at all
RULE-15	Canonical URL not found in sitemap	If the canonical URL and sitemap URL disagree, Google sees conflicting signals about which page matters. Both should point to the same URL
RULE-16	Duplicate H1 across different pages	Two pages with the same H1 create a cannibalization signal. Google cannot tell which page should rank for that heading's topic. Differentiate or consolidate

URL Structure (6 rules)

Clean URLs help search engines and users. These rules enforce the URL patterns that every major SEO study recommends.

Rule	What it catches	Why it matters
RULE-17	Uppercase characters in URL path	URLs are case-sensitive on most servers. `/About/` and `/about/` are different pages. Uppercase creates duplicate content risk
RULE-18	Special characters in URL path	Spaces, brackets, and non-ASCII characters in URLs cause encoding issues. Some crawlers fail to follow these links
RULE-19	Underscores in URL path	Google treats hyphens as word separators but underscores as joiners. `content-strategy` reads as two words; `content_strategy` reads as one
RULE-20	URL longer than 115 characters	Long URLs get truncated in search results and are harder to share. Backlinko found shorter URLs correlate with higher rankings
RULE-21	Query parameters on page URL	Parameters like `?utm_source=...` on canonical pages create duplicate content. Strip parameters from canonical URLs
RULE-22	Internal links missing trailing slash	Inconsistent trailing slashes create duplicate URLs. If `/guides/` and `/guides` both work, Google may index both and split ranking signals

Sitemap Integrity (4 rules)

Your sitemap tells Google which pages exist and matter. These rules catch contradictions between your sitemap and your actual pages.

Rule	What it catches	Why it matters
RULE-23	No sitemap.xml at site root	Without a sitemap, Google relies entirely on crawling to discover pages. Large sites can have pages that Google never finds
RULE-24	Published pages missing from sitemap	If a page exists but is not in the sitemap, Google may deprioritize crawling it. Every indexable page should appear
RULE-25	noindex pages listed in sitemap	Contradictory signals: the sitemap says "index this" while the page says "do not index." Google sees confusion and may ignore both directives
RULE-26	robots.txt-blocked pages listed in sitemap	The sitemap includes a page that robots.txt blocks from crawling. Google cannot crawl it but sees it in the sitemap, another contradictory signal

Crawl Access (4 rules)

robots.txt controls which pages search engines can access. Mistakes here can accidentally hide your entire site.

Rule	What it catches	Why it matters
RULE-28	No robots.txt file	Every site should have one. Without it, crawlers assume everything is allowed, including admin pages, staging content, and asset directories you may not want indexed
RULE-29	robots.txt blocking CSS or JavaScript files	Google needs CSS and JS to render your pages. Blocking them means Google sees your site without styling and cannot evaluate mobile-friendliness, layout, or dynamic content
RULE-30	No Sitemap directive in robots.txt	The Sitemap line in robots.txt is how crawlers discover your sitemap. Without it, they have to guess the location
RULE-31	robots.txt blocking all crawlers (`Disallow: /`)	This makes your entire site invisible to search engines. Usually a leftover from staging that was never removed

Internal Links (3 rules)

Internal links distribute ranking power across your site. Broken links waste it. Orphan pages never receive it.

Rule	What it catches	Why it matters
RULE-33	Broken internal links (target page does not exist)	The 2024 Google API leak lists `badBacklinks` as a negative ranking signal. Broken internal links waste crawl budget and pass zero ranking power. Wire checks every link against the rendered site directory
RULE-34	Internal links with empty anchor text	Google uses anchor text to understand what the target page is about. Empty anchor text wastes this signal. Screen readers also cannot describe the link to visually impaired users
RULE-35	Orphan pages (zero inbound internal links)	SearchPilot's controlled experiment showed that adding internal links to orphan pages produces a statistically significant traffic increase. Pages with no inbound links are invisible to navigation and receive minimal crawl attention

Security (1 rule)

Rule	What it catches	Why it matters
RULE-37	HTTP resources loaded on HTTPS pages (mixed content)	Browsers block or warn about mixed content. Images loaded over HTTP on an HTTPS page may not display. Scripts may be blocked entirely. Google flags mixed content as a security issue

Images (3 rules)

Rule	What it catches	Why it matters
RULE-40	Images missing alt text	Alt text serves accessibility (screen readers) and gives Google context for image search. While Moz found alt text is a ranking factor for Google Images specifically, accessibility compliance is reason enough
RULE-41	Broken image sources (local file missing)	A broken image leaves a blank space on the page. Users see it immediately. Google's page quality ratings penalize pages with broken media
RULE-42	Images without explicit width and height	Missing dimensions cause Cumulative Layout Shift. The page jumps around as images load. CLS is a Core Web Vitals metric that affects user experience and, to a limited extent, search rankings

Multilingual (3 rules)

These rules only activate when your pages use hreflang tags for multilingual content. If your site is single-language, they are skipped automatically.

Rule	What it catches	Why it matters
RULE-43	Invalid language codes in hreflang tags	Hreflang uses BCP 47 language tags (en, de, fr, etc.). A typo like "eng" instead of "en" means Google ignores the tag and may show the wrong language version to searchers
RULE-44	Hreflang tags present but no x-default	The x-default tag tells Google which page to show when no language matches the searcher's region. Without it, Google picks one and may pick wrong
RULE-45	Hreflang links without reciprocal confirmation	If page A says "my German version is page B" but page B does not link back to page A, Google treats the hreflang as unconfirmed and may ignore it

Structured Data (3 rules)

JSON-LD structured data helps Google understand your content type, author, dates, and relationships. Invalid structured data is worse than none at all.

Rule	What it catches	Why it matters
RULE-46	Invalid JSON in JSON-LD script blocks	Malformed JSON means Google cannot parse any of your structured data. A single missing comma or bracket breaks the entire block
RULE-47	Unrecognized schema.org type	Using an `@type` that schema.org does not define means Google ignores the entire structured data block. Wire checks against 18 recognized types including Article, WebPage, Organization, Product, FAQPage, and BreadcrumbList
RULE-48	Broken URLs in structured data fields	If your JSON-LD references a URL (for images, logos, or page links) that does not exist on your site, Google may flag the structured data as misleading

Indexability (2 rules)

Rule	What it catches	Why it matters
RULE-49	noindex on pages that other pages link to	If pages link to a noindexed page, those links pass ranking power to a page Google will not show in search results. The noindex is usually accidental, a leftover from staging or a template error
RULE-50	nofollow on internal links	`rel="nofollow"` tells Google not to follow a link or pass ranking power through it. On internal links, this wastes your own site's link equity. Internal nofollow is almost always a mistake

Where This Fits in Wire's Quality System

Wire enforces content quality at four levels. Each catches what the previous layers missed.

Layer	When it runs	What it checks	Cost
Prevent	Before Claude writes	Styleguide rules taught to the AI before every prompt	Free
Fix	On every save	9 auto-corrections applied before content hits disk	Free
Detect	On demand (`audit`)	13 content-level checks across all pages	Free
Verify	After build (`lint`)	44 HTML-level checks across rendered site	Free

The first three layers work on markdown, the raw content. Build verification works on the finished HTML, what search engines and visitors actually see. A page can have perfect markdown and still fail verification if a template forgets to insert a canonical tag, or if a Jinja2 partial omits the viewport meta tag.

This is why Wire runs verification after rendering, not before. The template layer can introduce problems that do not exist in the content layer. Wire catches them before they reach production.

What Wire Does Not Check at Build Time

Six checks from the original specification require a live HTTP server and are deliberately excluded from build verification:

Redirect chains (RULE-14): requires following HTTP redirects, which only work on a running server
4xx/5xx status codes (RULE-27): requires HTTP requests to external URLs
External link validation (RULE-32): requires fetching external pages, which is slow and rate-limited
SSL certificate validity (RULE-36): requires connecting to the live server
Page load speed (RULE-38): requires a browser rendering engine (Lighthouse)
Mobile rendering (RULE-39): requires device emulation

These are server-level and network-level checks. Wire is a build tool. It works with files on disk. For live-server checks, use Google Search Console's URL Inspection tool or a monitoring service like Pingdom or UptimeRobot.

Typical Results

On a real 316-page production site, the first build verification run found:

4 pages with duplicate H1 tags across different sections
12 internal links pointing to pages that had been renamed
2 pages where the template failed to insert Open Graph tags
1 page with a JSON-LD block containing a trailing comma (invalid JSON)

Total time: 1.8 seconds. Total cost: zero. Every issue included the exact URL and a specific fix instruction. The same issues would have taken a manual review hours to find, if they were found at all.

Running Selectively

Not every check matters for every site. A single-language site does not need hreflang rules. A site without structured data does not need JSON-LD rules. Run only what applies:

# Only check page structure and internal links
python -m wire.lint --site site/ --rules RULE-01,RULE-02,RULE-03,RULE-05,RULE-07,RULE-33,RULE-35

# Only check sitemap consistency
python -m wire.lint --site site/ --rules RULE-23,RULE-24,RULE-25,RULE-26

Every rule ID is stable. You can save your preferred rule set and run the same checks on every build.