Build Verification - 44 Automated Checks on Every Deploy
Your site built without errors. But broken internal links, missing canonical tags, and invalid JSON-LD are invisible until something goes wrong in search.
Wire runs 44 checks on the finished HTML after every build. Not on your markdown. On the rendered output that search engines actually crawl. This matters because a template can silently drop a canonical tag or forget the viewport meta, and your content layer never sees it. The checks are free, take under 2 seconds on a 200-page site, and every failure tells you the exact URL and what to fix. What brought you here?
The 44 rules fall into 11 categories: page structure, canonicalization, URL format, sitemap integrity, crawl access, internal links, security, images, multilingual, structured data, and indexability. Each rule exists because independent research shows it affects search performance or site health. Here's the part that surprises most operators: two of the most damaging issues are also the quietest. A noindex tag left over from staging on a page that other pages link to. A JSON-LD block with a trailing comma that silently invalidates all your structured data. Which of these matches your situation?
Build verification is the fourth layer in Wire's quality system. The first three layers work on markdown before and during writing. This layer works on the finished HTML after rendering. That sequence matters: a page can pass every content-level check and still fail verification because the template introduced the problem, not the content. A Jinja2 partial that omits the viewport meta tag. A canonical tag the template forgot to insert. The markdown was fine. The rendered page was not. But there's a separate question worth considering: some checks require a live server and Wire deliberately skips them at build time.
Six checks are excluded from build verification on purpose. Redirect chains, HTTP status codes, external link validation, SSL certificate validity, page load speed, and mobile rendering all require a live server or a browser engine. Wire works with files on disk. It cannot follow an HTTP redirect or fetch an external URL. This is a deliberate boundary, not a gap. For those checks, Google Search Console's URL Inspection tool or a monitoring service handles what Wire cannot. The question is whether the 44 checks Wire does run are covering the failures that actually reached your site.
On a real 316-page production site, the first verification run found 12 internal links pointing to renamed pages, 4 duplicate H1 tags across different sections, 2 pages where the template dropped Open Graph tags entirely, and 1 JSON-LD block with a trailing comma. Total time: 1.8 seconds. Every failure included the exact URL and a specific fix. The same audit done manually would have taken hours, and the JSON-LD issue almost certainly would have been missed. The site had been live for months. None of these were caught by the content layer.
Not every check applies to every site. A single-language site skips the three hreflang rules automatically. A site without JSON-LD has no structured data to validate. You can pass a comma-separated list of rule IDs to run only what applies, and every rule ID is stable across versions, so you can save a preferred set and run the same checks on every build. The tradeoff: a rule you skip is a failure you won't see. Two of the most common production issues, broken internal links and noindex left on linked pages, are easy to exclude by accident if you're trimming aggressively.
Wire does not trust what it builds. After rendering every markdown page to HTML, it runs 44 automated checks across the finished site. Every check produces a specific, actionable message. Pages that fail do not ship silently. You see exactly what broke and where.
This is the final layer of Wire's quality system. The content quality layers work on markdown before and during writing. Build verification works on the finished HTML after rendering. It catches problems that only become visible in the final output: broken links between rendered pages, missing meta tags that templates should have inserted, images without dimensions, structured data with invalid JSON.
How It Works
After wire.build renders your site to HTML, run the verification:
python -m wire.lint --site /path/to/site
Wire parses every HTML file, builds a cross-page index (for duplicate detection and orphan checks), and runs all 44 rules. The output lists every failure with the rule ID, the affected URL, and what to fix.
You can also run specific checks:
python -m wire.lint --site /path/to/site --rules RULE-01,RULE-05,RULE-33
On a typical 200-page site, the full scan takes under 2 seconds. Zero API calls, zero cost.
What Wire Checks
The 44 rules fall into 11 categories. Each rule exists because independent research, not Google's public statements, shows it affects search performance or site health.
Page Structure (11 rules)
These checks verify that every page has the HTML elements search engines need to understand and rank it.
| Rule | What it catches | Why it matters |
|---|---|---|
| RULE-01 | Missing or empty H1 | SearchPilot A/B test: pages with keyword-aligned H1 saw 28% more traffic. The H1 tells Google the page's primary topic |
| RULE-02 | Multiple H1 tags on one page | The 2024 Google API leak confirmed entity extraction uses heading structure. Multiple H1s dilute the primary topic signal |
| RULE-03 | Skipped heading levels (H1 then H3, no H2) | Heading hierarchy feeds semantic structure signals. Skipped levels break the logical outline Google uses for entity extraction |
| RULE-04 | Target search query missing from H1 or title | Zyppy's 81K title study: Google rewrites titles 61.6% of the time when they do not match the page's primary heading |
| RULE-05 | Title tag missing, empty, too short (<20), too long (>65), or duplicate | Backlinko's analysis of 11.8M search results: title tag optimization correlates with higher rankings. Google truncates titles over ~60 characters in search results |
| RULE-06 | Multiple <title> tags |
Browsers use the first one. Search engines may use either. Remove the duplicate |
| RULE-07 | Meta description missing or over 158 characters | Google truncates long descriptions in search results. A missing description means Google writes one for you, and it rarely matches your intent |
| RULE-08 | Thin content (fewer than 200 words) | Multiple case studies show thin pages hurt when they cannibalize stronger siblings. 201Creative deleted thin ecommerce pages and saw +867% traffic. This rule flags stubs that need expanding |
| RULE-09 | Missing Open Graph tags (og:title, og:description) | These control how your pages appear when shared on LinkedIn, Twitter, Slack, and other platforms. Missing tags mean the platform guesses, usually poorly |
| RULE-10 | Missing viewport meta tag | Without it, mobile browsers render pages at desktop width and scale down. Google's mobile-first indexing penalizes pages that are not mobile-friendly |
| RULE-11 | No character encoding in first 1024 bytes | Browsers need to know the encoding before parsing. Missing charset can cause garbled text, especially for non-English content |
Canonicalization (4 rules)
Canonical tags tell search engines which version of a page is the "real" one. Getting this wrong splits your ranking signals across multiple URLs.
| Rule | What it catches | Why it matters |
|---|---|---|
| RULE-12 | Missing canonical tag | Without a canonical tag, Google decides which URL to index. If your page is accessible at multiple URLs (with/without trailing slash, with/without www), Google may split your ranking signals |
| RULE-13 | Multiple canonical tags | When Google finds two canonical tags pointing to different URLs, it ignores both. Your page has no canonical signal at all |
| RULE-15 | Canonical URL not found in sitemap | If the canonical URL and sitemap URL disagree, Google sees conflicting signals about which page matters. Both should point to the same URL |
| RULE-16 | Duplicate H1 across different pages | Two pages with the same H1 create a cannibalization signal. Google cannot tell which page should rank for that heading's topic. Differentiate or consolidate |
URL Structure (6 rules)
Clean URLs help search engines and users. These rules enforce the URL patterns that every major SEO study recommends.
| Rule | What it catches | Why it matters |
|---|---|---|
| RULE-17 | Uppercase characters in URL path | URLs are case-sensitive on most servers. /About/ and /about/ are different pages. Uppercase creates duplicate content risk |
| RULE-18 | Special characters in URL path | Spaces, brackets, and non-ASCII characters in URLs cause encoding issues. Some crawlers fail to follow these links |
| RULE-19 | Underscores in URL path | Google treats hyphens as word separators but underscores as joiners. content-strategy reads as two words; content_strategy reads as one |
| RULE-20 | URL longer than 115 characters | Long URLs get truncated in search results and are harder to share. Backlinko found shorter URLs correlate with higher rankings |
| RULE-21 | Query parameters on page URL | Parameters like ?utm_source=... on canonical pages create duplicate content. Strip parameters from canonical URLs |
| RULE-22 | Internal links missing trailing slash | Inconsistent trailing slashes create duplicate URLs. If /guides/ and /guides both work, Google may index both and split ranking signals |
Sitemap Integrity (4 rules)
Your sitemap tells Google which pages exist and matter. These rules catch contradictions between your sitemap and your actual pages.
| Rule | What it catches | Why it matters |
|---|---|---|
| RULE-23 | No sitemap.xml at site root | Without a sitemap, Google relies entirely on crawling to discover pages. Large sites can have pages that Google never finds |
| RULE-24 | Published pages missing from sitemap | If a page exists but is not in the sitemap, Google may deprioritize crawling it. Every indexable page should appear |
| RULE-25 | noindex pages listed in sitemap | Contradictory signals: the sitemap says "index this" while the page says "do not index." Google sees confusion and may ignore both directives |
| RULE-26 | robots.txt-blocked pages listed in sitemap | The sitemap includes a page that robots.txt blocks from crawling. Google cannot crawl it but sees it in the sitemap, another contradictory signal |
Crawl Access (4 rules)
robots.txt controls which pages search engines can access. Mistakes here can accidentally hide your entire site.
| Rule | What it catches | Why it matters |
|---|---|---|
| RULE-28 | No robots.txt file | Every site should have one. Without it, crawlers assume everything is allowed, including admin pages, staging content, and asset directories you may not want indexed |
| RULE-29 | robots.txt blocking CSS or JavaScript files | Google needs CSS and JS to render your pages. Blocking them means Google sees your site without styling and cannot evaluate mobile-friendliness, layout, or dynamic content |
| RULE-30 | No Sitemap directive in robots.txt | The Sitemap line in robots.txt is how crawlers discover your sitemap. Without it, they have to guess the location |
| RULE-31 | robots.txt blocking all crawlers (Disallow: /) |
This makes your entire site invisible to search engines. Usually a leftover from staging that was never removed |
Internal Links (3 rules)
Internal links distribute ranking power across your site. Broken links waste it. Orphan pages never receive it.
| Rule | What it catches | Why it matters |
|---|---|---|
| RULE-33 | Broken internal links (target page does not exist) | The 2024 Google API leak lists badBacklinks as a negative ranking signal. Broken internal links waste crawl budget and pass zero ranking power. Wire checks every link against the rendered site directory |
| RULE-34 | Internal links with empty anchor text | Google uses anchor text to understand what the target page is about. Empty anchor text wastes this signal. Screen readers also cannot describe the link to visually impaired users |
| RULE-35 | Orphan pages (zero inbound internal links) | SearchPilot's controlled experiment showed that adding internal links to orphan pages produces a statistically significant traffic increase. Pages with no inbound links are invisible to navigation and receive minimal crawl attention |
Security (1 rule)
| Rule | What it catches | Why it matters |
|---|---|---|
| RULE-37 | HTTP resources loaded on HTTPS pages (mixed content) | Browsers block or warn about mixed content. Images loaded over HTTP on an HTTPS page may not display. Scripts may be blocked entirely. Google flags mixed content as a security issue |
Images (3 rules)
| Rule | What it catches | Why it matters |
|---|---|---|
| RULE-40 | Images missing alt text | Alt text serves accessibility (screen readers) and gives Google context for image search. While Moz found alt text is a ranking factor for Google Images specifically, accessibility compliance is reason enough |
| RULE-41 | Broken image sources (local file missing) | A broken image leaves a blank space on the page. Users see it immediately. Google's page quality ratings penalize pages with broken media |
| RULE-42 | Images without explicit width and height | Missing dimensions cause Cumulative Layout Shift. The page jumps around as images load. CLS is a Core Web Vitals metric that affects user experience and, to a limited extent, search rankings |
Multilingual (3 rules)
These rules only activate when your pages use hreflang tags for multilingual content. If your site is single-language, they are skipped automatically.
| Rule | What it catches | Why it matters |
|---|---|---|
| RULE-43 | Invalid language codes in hreflang tags | Hreflang uses BCP 47 language tags (en, de, fr, etc.). A typo like "eng" instead of "en" means Google ignores the tag and may show the wrong language version to searchers |
| RULE-44 | Hreflang tags present but no x-default | The x-default tag tells Google which page to show when no language matches the searcher's region. Without it, Google picks one and may pick wrong |
| RULE-45 | Hreflang links without reciprocal confirmation | If page A says "my German version is page B" but page B does not link back to page A, Google treats the hreflang as unconfirmed and may ignore it |
Structured Data (3 rules)
JSON-LD structured data helps Google understand your content type, author, dates, and relationships. Invalid structured data is worse than none at all.
| Rule | What it catches | Why it matters |
|---|---|---|
| RULE-46 | Invalid JSON in JSON-LD script blocks | Malformed JSON means Google cannot parse any of your structured data. A single missing comma or bracket breaks the entire block |
| RULE-47 | Unrecognized schema.org type | Using an @type that schema.org does not define means Google ignores the entire structured data block. Wire checks against 18 recognized types including Article, WebPage, Organization, Product, FAQPage, and BreadcrumbList |
| RULE-48 | Broken URLs in structured data fields | If your JSON-LD references a URL (for images, logos, or page links) that does not exist on your site, Google may flag the structured data as misleading |
Indexability (2 rules)
| Rule | What it catches | Why it matters |
|---|---|---|
| RULE-49 | noindex on pages that other pages link to | If pages link to a noindexed page, those links pass ranking power to a page Google will not show in search results. The noindex is usually accidental, a leftover from staging or a template error |
| RULE-50 | nofollow on internal links | rel="nofollow" tells Google not to follow a link or pass ranking power through it. On internal links, this wastes your own site's link equity. Internal nofollow is almost always a mistake |
Where This Fits in Wire's Quality System
Wire enforces content quality at four levels. Each catches what the previous layers missed.
| Layer | When it runs | What it checks | Cost |
|---|---|---|---|
| Prevent | Before Claude writes | Styleguide rules taught to the AI before every prompt | Free |
| Fix | On every save | 9 auto-corrections applied before content hits disk | Free |
| Detect | On demand (audit) |
13 content-level checks across all pages | Free |
| Verify | After build (lint) |
44 HTML-level checks across rendered site | Free |
The first three layers work on markdown, the raw content. Build verification works on the finished HTML, what search engines and visitors actually see. A page can have perfect markdown and still fail verification if a template forgets to insert a canonical tag, or if a Jinja2 partial omits the viewport meta tag.
This is why Wire runs verification after rendering, not before. The template layer can introduce problems that do not exist in the content layer. Wire catches them before they reach production.
What Wire Does Not Check at Build Time
Six checks from the original specification require a live HTTP server and are deliberately excluded from build verification:
- Redirect chains (RULE-14): requires following HTTP redirects, which only work on a running server
- 4xx/5xx status codes (RULE-27): requires HTTP requests to external URLs
- External link validation (RULE-32): requires fetching external pages, which is slow and rate-limited
- SSL certificate validity (RULE-36): requires connecting to the live server
- Page load speed (RULE-38): requires a browser rendering engine (Lighthouse)
- Mobile rendering (RULE-39): requires device emulation
These are server-level and network-level checks. Wire is a build tool. It works with files on disk. For live-server checks, use Google Search Console's URL Inspection tool or a monitoring service like Pingdom or UptimeRobot.
Typical Results
On a real 316-page production site, the first build verification run found:
- 4 pages with duplicate H1 tags across different sections
- 12 internal links pointing to pages that had been renamed
- 2 pages where the template failed to insert Open Graph tags
- 1 page with a JSON-LD block containing a trailing comma (invalid JSON)
Total time: 1.8 seconds. Total cost: zero. Every issue included the exact URL and a specific fix instruction. The same issues would have taken a manual review hours to find, if they were found at all.
Running Selectively
Not every check matters for every site. A single-language site does not need hreflang rules. A site without structured data does not need JSON-LD rules. Run only what applies:
# Only check page structure and internal links
python -m wire.lint --site site/ --rules RULE-01,RULE-02,RULE-03,RULE-05,RULE-07,RULE-33,RULE-35
# Only check sitemap consistency
python -m wire.lint --site site/ --rules RULE-23,RULE-24,RULE-25,RULE-26
Every rule ID is stable. You can save your preferred rule set and run the same checks on every build.