GEO Technical Architecture

How a GEO-optimized website is structured at the infrastructure level. This page documents the technical patterns that make content visible to AI systems.

Clean-Room HTML Rendering

Modern websites are typically built with JavaScript frameworks like React, Vue, or Angular. These frameworks render content on the client side — the server sends a near-empty HTML shell, and JavaScript builds the page in the browser. AI crawlers do not execute JavaScript. They see the empty shell, not your content.

Clean-room HTML solves this by maintaining a parallel rendering path specifically for bots. When an AI crawler requests a page, it receives fully rendered, semantic HTML with all content present in the initial response. No JavaScript execution required.

  • Pure HTML + CSS — zero JavaScript dependencies for bot-served pages
  • Semantic markup: proper heading hierarchy (h1-h3), lists, paragraphs, sections
  • Content parity: bots see the same substantive content as human visitors
  • Inline styles to avoid external CSS dependency resolution by crawlers

Edge Function Architecture

Each bot-facing page is served by a dedicated Supabase edge function running on Deno Deploy. Edge functions execute at the CDN layer, delivering sub-100ms response times globally. The architecture:

  1. A Vercel edge proxy receives the incoming request
  2. User-Agent and path analysis determines if the request is from an AI crawler
  3. Bot requests are routed to the appropriate serve-bot-*-html edge function
  4. The edge function returns pre-built, clean-room HTML with JSON-LD structured data
  5. The bot crawl is logged to bot_crawl_logs for analytics
  6. Human requests continue to the React SPA as normal

This dual-path architecture means human visitors get the rich interactive experience of a modern SPA, while AI crawlers get the clean, parseable HTML they need.

Bot Detection & Routing

Bot detection is based on User-Agent string matching against known AI crawler identifiers. The system recognizes all major AI crawlers:

  • OpenAI: GPTBot, ChatGPT-User, OAI-SearchBot
  • Anthropic: ClaudeBot, Claude-Web, anthropic-ai
  • Google: Google-Extended, Googlebot
  • Perplexity: PerplexityBot, Perplexity-User
  • Others: Bingbot, Applebot, Amazonbot, Meta-ExternalAgent, YouBot, DuckAssistBot, cohere-ai

All detected bot visits are logged with the bot name, page path, user agent, and timestamp. This data feeds into the bot crawl analytics that power GEO Signal 3 (Bot Crawl Activity).

Structured Data Pipeline

Every bot-served page includes Schema.org JSON-LD markup appropriate to its content type. The structured data pipeline ensures consistency and correctness:

  • Organization: On the homepage and core pages — name, URL, description, contact info
  • WebSite: Site-level metadata for search and AI indexing
  • FAQPage: On FAQ pages with Question/Answer pairs for direct AI extraction
  • Article / TechArticle: On content-heavy pages like whitepapers and architecture docs
  • Service: On services pages with provider, name, and description
  • ContactPage: On contact pages with organization contact details