Skip to main content
Lumos Gate Docs

Bot Protection

Stop automated traffic with JS challenges, HMAC cookie verification, and bad bot UA detection. Self-hosted bot mitigation running on your VPS.

Overview

Lumos Gate's bot protection module defends your domains against automated traffic -- scrapers, credential stuffers, DDoS bots, and other non-browser clients. It works by serving a lightweight JavaScript challenge to first-time visitors. Real browsers execute the JavaScript and pass through transparently, while bots that cannot run JavaScript are blocked.

Bot protection is implemented as part of the WAF Lua module within HAProxy. It runs at the proxy level on your shield VPS, blocking bots before they ever reach your origin server. Unlike third-party solutions, all processing happens locally -- no data is sent to external services.

How It Works

The bot protection system uses a two-layer approach:

Layer 1: JavaScript Challenge

When a visitor first accesses a protected domain, the following happens:

  1. HAProxy intercepts the request and checks for a valid __lumos_verified HMAC cookie.
  2. If no valid cookie exists, a lightweight HTML page (under 2KB) with a JavaScript challenge is served with a 200 status code.
  3. The visitor's browser executes the JavaScript, which computes an HMAC hash using a server-side secret and the current timestamp.
  4. The JavaScript sets the __lumos_verified cookie with the format timestamp:hash.
  5. After a 1.5-second delay, the browser automatically reloads the page.
  6. HAProxy validates the HMAC cookie and allows the request through to the origin.
  7. All subsequent requests from that visitor pass through immediately (cookie is already set).
First Visit (no cookie):
    Browser -> HAProxy -> JS Challenge Page (200) -> Browser executes JS
    Browser sets __lumos_verified cookie
    Browser reloads after 1.5s
    Browser -> HAProxy (validates HMAC cookie) -> Origin Server

Subsequent Visits (valid cookie):
    Browser -> HAProxy (validates cookie) -> Origin Server (no delay)

For real users, this process is nearly invisible -- the challenge page shows a "Verifying you are human..." message with a spinner animation, resolves in approximately 1.5 seconds, and then the user sees the actual page.

Important: The JS challenge is automatically skipped for paths starting with /api/. This means API endpoints behind the same domain are not affected by bot protection. If you need to protect API endpoints, use rate limiting or IP blacklist instead.

Layer 2: Bad Bot User-Agent Blocking

Known malicious bot user agents are blocked outright at the HAProxy level with a 403 Forbidden response, without even serving the JS challenge. This catches common attack tools before they consume any resources. The blocked patterns include:

CategoryBlocked Patterns
Generic bot signaturesBot, Crawler, Spider, Scraper
Attack toolsHTTrack, Mechanize, PhantomJS
HTTP librarieswget, curl, python-requests, Go-http-client, Java/, libwww
Headless browsersHeadlessChrome

Note: This is a supplementary layer. Sophisticated bots can fake user agents, which is why the JS challenge is the primary defense mechanism. User-agent blocking catches the low-effort automated traffic that does not bother to disguise itself.

The __lumos_verified cookie is the core of the bot protection system:

  • Algorithm -- Uses a djb2 hash variant mixed with a server-side secret derived from the agent token via SHA-256. The secret is embedded in the Lua script at generation time.
  • Format -- The cookie value is timestamp:hexhash (e.g., 1708444800:a1b2c3d4).
  • Tamper-proof -- Modifying any part of the cookie (timestamp or hash) invalidates it because the hash is recomputed server-side.
  • Time-limited -- The cookie is valid for 1 hour from the timestamp. Cookies older than 1 hour or with a future timestamp (more than 60 seconds ahead) are rejected.
  • Session-scoped on the browser -- The JavaScript sets the cookie with a 1-hour expiry and SameSite=Lax.
  • Stable across restarts -- The bot challenge secret is derived from the agent token (SHA-256), so the secret remains the same across agent restarts. Visitors do not need to re-verify after an agent restart.

Since the HMAC is computed using a server-side secret that is only present in the Lua script on the shield VPS, bots cannot forge valid cookies without executing the JavaScript challenge in a real browser environment.

Warning: If a domain is assigned to multiple shield servers, each server has its own agent token and therefore its own HMAC secret. A cookie set by one shield server is not valid on another. If you use DNS-based load balancing across multiple shields, visitors may see the challenge page again when they are routed to a different shield server.

Whitelisted Bots

Legitimate search engine crawlers are automatically whitelisted and bypass the JS challenge. This ensures your SEO is not affected by bot protection. Whitelisted bot patterns include:

BotService
GooglebotGoogle Search
Bingbot / bingbotMicrosoft Bing
SlurpYahoo Search
DuckDuckBotDuckDuckGo
BaiduspiderBaidu
YandexBotYandex
facebotFacebook
TwitterbotTwitter/X

The whitelist check runs before the bad bot check, so a user-agent containing both a good bot pattern and a generic "Bot" pattern will be allowed through. For example, Googlebot/2.1 contains "Bot" (which would match the bad bot pattern) but also matches Googlebot (a good bot pattern), so it is whitelisted.

Note: Whitelisting is based on user-agent string matching. Sophisticated attackers could spoof these user agents. For critical applications, you can verify that requests claiming to be from Googlebot actually originate from Google's IP ranges by checking the WAF events log and cross-referencing IPs.

Enabling Bot Protection

  1. Navigate to Dashboard -> WAF.
  2. Select the domain you want to protect.
  3. Make sure the WAF is enabled for that domain.
  4. Toggle Bot Protection to on.

Behind the scenes, this creates a WAF rule with rule_type: "block_bot" and value: {"enabled": true} for that domain. The config is pushed to the agent, which regenerates the Lua WAF script with bot protection logic included and reloads HAProxy.

Bot protection is configured per domain. You can enable it on your public website while leaving it disabled on API endpoints or internal services that receive legitimate automated traffic.

Configuration Options

OptionDescriptionDefault
EnabledToggle bot protection on/off for the domainOff
Challenge validityHow long the HMAC cookie stays valid before requiring a new challenge1 hour (hardcoded in Lua template)

Note: The challenge validity period is currently 1 hour and is set in the Lua WAF template. This is not configurable per-domain from the dashboard. The browser-side cookie expiry also matches at 1 hour.

When to Use Bot Protection

Recommended for:

  • Public websites and landing pages
  • Login and registration pages (combine with rate limiting for defense in depth)
  • Contact forms and submission endpoints
  • E-commerce product pages (prevents price scraping)
  • Any page targeted by scraping or automated attacks

Consider disabling for:

  • API endpoints that receive legitimate automated requests (webhooks, integrations) -- these are auto-skipped for /api/ paths, but disable bot protection entirely if your API uses other path prefixes
  • Health check endpoints used by monitoring services (add their IPs to the whitelist instead)
  • Static asset domains (CDN, images, CSS, JS files)
  • Internal tools and admin panels (use IP blacklist whitelist instead)
  • Domains serving RSS feeds or other machine-readable content

Performance Impact

The JS challenge adds minimal overhead:

  • First visit only -- Once the HMAC cookie is set, all subsequent requests for 1 hour pass through with only a cookie validation check (microseconds).
  • Challenge page size -- The HTML + JavaScript + CSS payload is under 2KB, loading in under 100ms on any connection. It includes an inline spinner animation and dark-themed styling.
  • 1.5-second delay -- The challenge page waits 1.5 seconds before reloading. This is intentional to give the JavaScript time to execute and the cookie to be set, and to deter rapid automated retries.
  • HAProxy overhead -- Cookie validation happens in the Lua module and takes microseconds per request. No external calls, no disk I/O.
  • No external calls -- Everything runs locally on the shield VPS. No third-party services, no API calls, no DNS lookups. Your visitors' data stays on your infrastructure.

Comparison with Other Solutions

FeatureLumos Bot ProtectionCloudflare Under AttackCAPTCHA
User frictionVery low (~1.5s once)Low (~5s wait)High (manual solve)
Blocks headless browsersPartial (UA-based)Yes (browser fingerprint)Yes
SEO impactNone (crawlers whitelisted)PossiblePossible
Self-hostedYes (runs on your VPS)No (Cloudflare infrastructure)No (third-party service)
PrivacyNo data sent externallyData processed by CloudflareData sent to CAPTCHA provider
API path bypassAutomatic (/api/* skipped)Manual configurationManual configuration
Cookie duration1 hourSession-basedVaries

WAF Processing Order

Bot protection sits between the IP blacklist and rate limiting in the WAF processing pipeline:

Request -> IP Whitelist check (bypass all if matched)
        -> IP Blacklist check (block with 403)
        -> Bot Protection:
           1. Bad UA check (block with 403)
           2. Good bot check (skip challenge if matched)
           3. Skip /api/* paths (no challenge)
           4. Verify __lumos_verified cookie
           5. Serve JS challenge if cookie missing/invalid
        -> Rate Limit check
        -> OWASP rules
        -> Pass to origin

This means:

  • Whitelisted IPs never see the bot challenge.
  • Blacklisted IPs are blocked before the bot check runs.
  • Bots that fail the challenge do not consume rate limit counters for their IP.

Troubleshooting

Legitimate users seeing the challenge page repeatedly:

  • Check that cookies are not being blocked by the user's browser or a browser extension (privacy-focused extensions sometimes block cookies with unfamiliar names).
  • Verify the user's clock is reasonably accurate -- the HMAC cookie includes a timestamp, and cookies with timestamps more than 1 hour in the past or 60 seconds in the future are rejected.
  • If using multiple shield servers, each server has its own HMAC secret. Cookies from one server are not valid on another. Consider using sticky sessions in your DNS or load balancer configuration.
  • Check for intermediate proxies or CDNs that might be stripping the __lumos_verified cookie.

Search engine crawlers being blocked:

  • Verify that bot protection is using the latest whitelist by checking the agent is connected and receiving config updates (Dashboard -> Servers).
  • Check the WAF events log in Dashboard -> WAF -> Events to see if crawlers are being blocked and what reason code is shown.
  • If the reason is bot_blocked (not bot_challenge), the crawler's user-agent may be matching a bad bot pattern. This should not happen with the default whitelist, so check for custom rules that might interfere.

API clients being blocked:

  • Paths starting with /api/ automatically bypass the JS challenge. If your API uses a different path prefix (e.g., /v1/, /graphql), disable bot protection for that domain entirely.
  • Alternatively, add the API client's IP to the domain's whitelist to bypass all WAF checks.
  • Use rate limiting instead of bot protection for API endpoints.

Challenge page not rendering correctly:

  • The challenge page uses minimal HTML/CSS/JS with no external dependencies. If it is not rendering, check for Content-Security-Policy headers from an upstream proxy that might block inline scripts.
  • Verify that the shield VPS agent is running and HAProxy is healthy. See Troubleshooting for diagnostic steps.

Next Steps

  • WAF Overview -- Understand the full WAF architecture and all protection modules
  • IP Blacklist -- Block specific IPs and set up trusted IP whitelists
  • Rate Limiting -- Configure per-domain request rate limits
  • Origin Firewall -- Lock down your origin to only accept traffic from shield VPS
  • Domains -- Manage domain configurations and origin assignments
  • Multiple Servers -- Understand HMAC cookie implications with multi-server setups
  • Troubleshooting -- Common issues and diagnostic steps