GEO Technical Architecture
How a GEO-optimized website is structured at the infrastructure level. This page documents the technical patterns that make content visible to AI systems.
Clean-Room HTML Rendering
Modern websites are typically built with JavaScript frameworks like React, Vue, or Angular. These frameworks render content on the client side — the server sends a near-empty HTML shell, and JavaScript builds the page in the browser. AI crawlers do not execute JavaScript. They see the empty shell, not your content.
Clean-room HTML solves this by maintaining a parallel rendering path specifically for bots. When an AI crawler requests a page, it receives fully rendered, semantic HTML with all content present in the initial response. No JavaScript execution required.
- Pure HTML + CSS — zero JavaScript dependencies for bot-served pages
- Semantic markup: proper heading hierarchy (h1-h3), lists, paragraphs, sections
- Content parity: bots see the same substantive content as human visitors
- Inline styles to avoid external CSS dependency resolution by crawlers
Edge Function Architecture
Each bot-facing page is served by a dedicated Supabase edge function running on Deno Deploy. Edge functions execute at the CDN layer, delivering sub-100ms response times globally. The architecture:
- A Vercel edge proxy receives the incoming request
- User-Agent and path analysis determines if the request is from an AI crawler
- Bot requests are routed to the appropriate
serve-bot-*-htmledge function - The edge function returns pre-built, clean-room HTML with JSON-LD structured data
- The bot crawl is logged to
bot_crawl_logsfor analytics - Human requests continue to the React SPA as normal
This dual-path architecture means human visitors get the rich interactive experience of a modern SPA, while AI crawlers get the clean, parseable HTML they need.
Bot Detection & Routing
Bot detection is based on User-Agent string matching against known AI crawler identifiers. The system recognizes all major AI crawlers:
- OpenAI: GPTBot, ChatGPT-User, OAI-SearchBot
- Anthropic: ClaudeBot, Claude-Web, anthropic-ai
- Google: Google-Extended, Googlebot
- Perplexity: PerplexityBot, Perplexity-User
- Others: Bingbot, Applebot, Amazonbot, Meta-ExternalAgent, YouBot, DuckAssistBot, cohere-ai
All detected bot visits are logged with the bot name, page path, user agent, and timestamp. This data feeds into the bot crawl analytics that power GEO Signal 3 (Bot Crawl Activity).
Structured Data Pipeline
Every bot-served page includes Schema.org JSON-LD markup appropriate to its content type. The structured data pipeline ensures consistency and correctness:
- Organization: On the homepage and core pages — name, URL, description, contact info
- WebSite: Site-level metadata for search and AI indexing
- FAQPage: On FAQ pages with Question/Answer pairs for direct AI extraction
- Article / TechArticle: On content-heavy pages like whitepapers and architecture docs
- Service: On services pages with provider, name, and description
- ContactPage: On contact pages with organization contact details