On this page
Most product pages tell you what's great. This one tells you what's great, what's messy, and what keeps us up at night. Wire is a real tool used on real sites, and we think honesty about its rough edges is more useful than polish. For the full architecture overview, see how the modules and data model fit together. For an external perspective on what makes static site generators competitive, see Netlify's State of the Web report.
What surprised us building Wire
Wire was not designed top-down. It grew from solving actual problems on production sites, and several things turned out differently than we expected.
The 3-layer quality system was an accident. PREVENT (teach the AI the rules before writing), FIX (auto-correct on save), and DETECT (warn about issues after the fact) emerged from three separate rounds of fixing real failures. We kept finding that prompts alone could not prevent every mistake, and post-hoc warnings were useless if the bad content was already saved. The layered approach was never planned; it just turned out that you need all three layers or quality slips through the cracks.
BM25 works surprisingly well without a neural model. When we needed to route keywords to the right pages (should this keyword expand an existing page or justify a new one?), we expected to need embeddings or a language model. Instead, classic BM25 term-frequency scoring, combined with impression ratios and page breadth signals, produces reliable routing decisions. The analysis phase runs entirely offline with zero API calls and zero cost. We were genuinely surprised that a 1990s algorithm held up this well against 2026 content problems.
Source diversity detection caught real problems we did not know we had. On idp-software.com, the automated audit flagged 32 pages with concentrated external sources: articles that cited the same domain 4, 5, 6 times. After the auto-fix pipeline (deduplicate external links, diversify on refine), that number dropped to zero. We built the detection because it seemed like good practice. We did not expect it to find that many problems on a site we thought was well-maintained.
The junior-senior news pattern outperforms batch evaluation. Our first approach sent all articles to the AI in one batch. The results were mediocre. Important details got lost in volume. Splitting into individual "junior analyst" evaluations (one article at a time) followed by a "senior editor" synthesis produces noticeably better output. Each article gets proper attention, and the synthesis step catches contradictions between sources that batch evaluation misses.
Schema validation refusing builds was initially terrifying. When we first shipped strict frontmatter validation, rejecting pages with missing titles or malformed metadata, we expected complaints. Instead, users told us they preferred it. "Tell me what's wrong, don't guess" turns out to be a better experience than silently patching bad data and producing confusing output downstream.
What's great (with evidence)
These are not aspirational claims. They are measured results from production use.
Minimal token usage vs $150+ manual. A single enrich call uses minimal AI tokens, covered by your AI subscription. It performs local analysis (free), targeted web research, and a combined improve pass. Manual SEO content work (research, rewrite, optimize, review) runs $100-200 per page at agency rates. That is orders of magnitude cheaper. See the full pricing comparison.
45 lint rules in the audit system. The audit command checks for duplicate titles, duplicate descriptions, orphan pages, broken internal links, source concentration, underlinked pages, title length violations, missing citations, thin content, heading hierarchy issues, H1 mismatches, and more. The content quality system documents every rule with the evidence behind it. Most enterprise SEO audit tools check fewer items, and they charge monthly for the privilege.
Zero-call analysis phase. The audit, analyze, and the analysis stage of enrich make zero API calls. Keyword presence scoring, BM25 ranking, keyword routing, amendment briefs: all computed locally with pure Python. You can run diagnostics on a thousand-page site without using any AI tokens.
Resume-on-interrupt. Every batch command (news, refine, reword, enrich) writes progress to .wire/progress-*.json. If your laptop dies, your VPN drops, or you hit Ctrl+C, run the same command with --resume and it picks up where it left off. Failed items are retried, not skipped.
Git-native workflow. Wire writes markdown files into your existing docs directory. Every change is a normal file change. Every page has a git history. There is no proprietary database, no lock-in, no export step. If you stop using Wire tomorrow, your content is still there, in standard markdown with YAML frontmatter.
Dry-run mode that actually works. The --dry-run flag writes .preview files and shows diffs without touching your real content. It skips the stamp step so your metadata stays clean. You can inspect exactly what Wire would do before letting it do it.
What worries us
We are shipping Wire with these known concerns. We think transparency about them is more valuable than pretending they do not exist.
Single-model dependency. Wire uses a single AI model for all generation. The claude_text() entry point could be swapped to another provider, but we have not tested alternatives. In practice, each generation uses minimal AI tokens. Model efficiency has improved with each generation, and the entry point abstraction means switching models is a one-line change.
Search API limits. The search console API returns a maximum of 25,000 rows per query. For a site with thousands of pages and tens of thousands of ranking keywords, this ceiling means some long-tail data gets truncated. Wire works within this limit, but very large sites (10,000+ pages) may miss low-volume keyword data that could inform better routing decisions.
No real-time monitoring. Wire is a batch tool. You run commands, they process, they finish. There is no daemon watching for new content, no webhook listener, no continuous audit loop. If a page breaks at 2am, Wire will not notice until you run audit the next morning. For teams that need continuous monitoring, Wire is a complement to those tools, not a replacement.
Windows as primary development environment. Wire is developed and tested primarily on Windows with Git Bash. The test suite (1,227 tests, 90% coverage) runs on Windows. Path handling uses pathlib which should be cross-platform, but "should be" and "is" are different statements. Linux and macOS users may encounter edge cases we have not hit. We fix these when reported, but we cannot claim equal confidence across platforms.
Prompt brittleness. Wire's output quality depends heavily on prompt engineering. The 18 prompt templates have been refined through hundreds of iterations, but they are still natural language instructions to a language model. Edge cases in content structure, unusual frontmatter, or unexpected topic layouts can produce suboptimal results. The 3-layer quality system catches most failures, but "most" is not "all."
Rate limiting is conservative. Wire defaults to 1-second delays between API calls. The search console API allows 1,200 requests per minute. We chose reliability over speed, but this means large batch operations (news gathering across 300+ pages) take longer than they theoretically need to. The delay is configurable via extra.wire.rate_limit_delay, but we have not tested aggressive settings in production.
Scalability boundaries
Wire works well for its current use case: a single operator running CLI commands against one site at a time. These are the architectural boundaries that would need to change for different usage patterns.
Single-process, single-site architecture. Module-level globals (DOCS_DIR, SITE, WIRE_CONFIG) are initialized on import from mkdocs.yml in the current working directory. One Python process handles one site. This is the right design for a CLI tool, but it rules out a SaaS API server or background job queue that processes multiple sites without forking. Moving to explicit config-passing would require threading a config object through every function, which is a large refactor with no current payoff.
LRU cache staleness during batch operations. _get_valid_internal_paths() is cached with @lru_cache(maxsize=1) and never invalidated. If a batch operation creates new pages (e.g., compare or create), broken link detection in _sanitize_content() will miss the new pages for the rest of the process lifetime. The same applies to generate_site_directory(). For current batch sizes (100-300 pages), this rarely matters because page creation and link validation happen in separate commands. At larger scale or in a single long-running process, this cache would need explicit invalidation.
Audit reads every page from disk twice. _audit_content_quality() scans all index.md files to build a pages dict, then reads them again for quality checks. For 1,000+ pages, this doubles the I/O. Not a bottleneck on SSDs (the entire scan takes under 2 seconds for 1,100 pages), but it could matter on network-mounted filesystems or very large sites.
No machine-readable audit output. The audit() method outputs via logger.info(). Any dashboard or monitoring tool would need to parse log strings. A structured JSON output mode would be straightforward to add but has not been needed yet.
Git-based date lookups scale linearly. build.py calls git log once per page to get creation and modification dates. For 1,000 pages, that is 1,000 subprocess calls during build. Each has a 10-second timeout. A pre-computed date cache (built once per build from a single git log call) would reduce this to constant time.
chief.py is the largest file at ~2,900 lines. The audit() method alone spans ~550 lines. This is a maintenance concern, not a runtime concern. Extracting audit, newsweek, and init into separate modules would improve readability without changing behavior.
Dual validate_frontmatter naming. wire/tools.py has a validate_frontmatter() that validates string content, and wire/schema.py has a validate_frontmatter() that validates metadata dicts. They do different things with the same name. This has not caused bugs because build.py imports from schema.py while content.py uses the one in tools.py, but it could confuse contributors.
What we are building next
These are active priorities, not a wish list. No timeline promises. We ship when the work is solid.
Lighthouse integration. Performance budgets derived from Lighthouse scores, tied to content operations. If a page's performance degrades after an update, Wire should flag it before deploy.
Multi-site dashboard. Wire currently operates on one site at a time (wherever wire.yml lives). Agencies managing 10+ sites need a single view across all of them: batch status, audit summaries, cost tracking.
Schema.org structured data. Wire generates JSON-LD structured data (Article, WebSite, CollectionPage) at build time based on page type and frontmatter dates. Expanding to FAQPage and HowTo schemas from content analysis is the next step.
Webhook notifications. When a 4-hour newsweek run finishes, you should not have to watch the terminal. Push notifications for completed batch operations are a straightforward improvement we plan to add.
Multi-model support. The claude_text() interface is model-agnostic by design. Testing alternative providers is straightforward since the function signature stays the same. Current priority is low because the default model delivers strong results at decreasing cost, but the abstraction exists for when it matters.