Skip to main content
Lumos Gate Docs

Troubleshooting

Fix agent offline errors, SSL provisioning failures, DNS misconfigurations, WAF blocks, and HAProxy issues. Diagnostic steps and solutions for every problem.

Troubleshooting

This page covers every common issue you may encounter with Lumos Gate and how to resolve it. Each section includes diagnostic steps, root causes, and solutions.


1. Agent Shows Offline in Dashboard

The most common issue. The agent appears as "Offline" in the dashboard even though the VPS is running.

Check Agent Service Status

systemctl status lumos-agent

If the service is not running, start it:

systemctl start lumos-agent

If it fails to start, check the logs:

journalctl -u lumos-agent -n 50 --no-pager

Verify the Connection Token

The agent authenticates with the WS server using the token provided during agent installation. If the token is incorrect, was regenerated, or the server was decommissioned and recreated, the agent cannot connect.

# Check for auth errors in the logs
journalctl -u lumos-agent | grep -i "auth\|token\|unauthorized\|401"

Solution: If the token is invalid, decommission the server in the dashboard, create a new one, and reinstall the agent with the new token:

curl -fsSL https://get.lumosgate.com/install | LUMOS_TOKEN=NEW_TOKEN LUMOS_FORCE=1 bash

Check Outbound Connectivity

The agent makes an outbound WebSocket (WSS) connection to the Lumos WS server on port 443. Ensure your VPS firewall allows outbound HTTPS connections.

# Test basic connectivity to the WS server
curl -v https://lumosgate.com/health

# Test WebSocket upgrade (should return 101 or connection upgrade headers)
curl -v -H "Connection: Upgrade" -H "Upgrade: websocket" https://lumosgate.com/ws

If outbound connections are blocked, check your VPS firewall rules:

# Check iptables rules
iptables -L -n

# Check UFW status (if using UFW)
ufw status

# Ensure outbound HTTPS is allowed
ufw allow out 443/tcp

Check for Multiple Agent Instances

Ensure only one instance of the agent is running:

ps aux | grep lumos-agent

If multiple instances are running, stop them all and restart the service:

systemctl stop lumos-agent
pkill -f lumos-agent
systemctl start lumos-agent

Check DNS Resolution from the VPS

The agent needs to resolve the Lumos WS server hostname:

# Verify DNS resolution works
dig lumosgate.com +short
nslookup lumosgate.com

If DNS resolution fails, check /etc/resolv.conf and ensure a valid nameserver is configured. You can temporarily add a public DNS:

echo "nameserver 8.8.8.8" >> /etc/resolv.conf

Agent Crashed or OOM Killed

Check if the agent was killed by the kernel's OOM killer:

dmesg | grep -i "oom\|lumos\|killed"
journalctl -u lumos-agent | grep -i "signal\|kill\|exit"

If the agent is being OOM-killed, your VPS may not have enough RAM. See Supported OS -- Memory for requirements.


2. Agent Won't Connect (WebSocket Issues)

The agent is running but cannot establish a WebSocket connection to the Lumos WS server.

Check the WS Server URL

The agent's configuration includes the WS server URL. If this is misconfigured, the agent will fail to connect.

journalctl -u lumos-agent | grep -i "ws\|websocket\|connect\|dial"

Firewall or NAT Issues

Some VPS providers or corporate networks drop long-lived WebSocket connections. The agent sends periodic heartbeats to keep the connection alive.

# Check if there is aggressive connection tracking
conntrack -L 2>/dev/null | wc -l

# Check conntrack timeout for established connections
sysctl net.netfilter.nf_conntrack_tcp_timeout_established 2>/dev/null

If the timeout is very low (under 300 seconds), idle WebSocket connections may be dropped. The agent handles reconnection automatically, but frequent disconnections indicate a network-level issue.

Proxy or Content Filter

Some VPS providers route traffic through a transparent proxy that interferes with WebSocket upgrades. Check with your provider if WebSocket connections are supported.

WS Server Down

If all agents across all your servers go offline simultaneously, the issue is likely with the Lumos WS server, not your agents. The agents will automatically reconnect with exponential backoff once the server is available again.


3. SSL Certificate Not Provisioning

SSL certificates are provisioned automatically via Let's Encrypt using the ACME HTTP-01 challenge. If a certificate is stuck in "provisioning" state, check the following.

DNS Must Point to the Shield VPS

The ACME HTTP-01 challenge requires that the domain resolves to the shield VPS IP address. Let's Encrypt will make an HTTP request to http://your-domain.com/.well-known/acme-challenge/... and it must reach the agent.

# Check where the domain resolves
dig +short example.com

# This should return your shield VPS IP, not your origin IP

If DNS is not pointing to the shield yet, update your DNS records first. See DNS Setup.

Port 80 Must Be Open

The HTTP-01 challenge uses port 80. Ensure it is not blocked by a firewall and HAProxy is listening:

# Check if port 80 is listening
ss -tlnp | grep :80

# Test from outside (run this from a different machine or use an online tool)
curl -v http://your-domain.com/.well-known/acme-challenge/test

Common reasons port 80 is blocked:

  • VPS provider firewall (check provider dashboard/control panel)
  • iptables or ufw rules blocking inbound port 80
  • Another service (Apache, Nginx) occupying port 80

Check Agent Logs for ACME Errors

journalctl -u lumos-agent | grep -i "acme\|ssl\|certificate\|letsencrypt\|challenge"

Common ACME errors and solutions:

ErrorCauseSolution
DNS not resolvingDomain does not point to shield IPUpdate A record to shield VPS IP
Rate limitedToo many certificate requests (50/domain/week)Wait 1 week and retry
Port 80 blockedFirewall blocking inbound HTTPOpen port 80 in firewall
Invalid domainDomain is internal/reserved or does not existUse a valid public domain
Challenge failedHTTP-01 verification request could not reach the agentCheck firewall, DNS, and HAProxy status
Authorization expiredChallenge took too longRetry the SSL provisioning

Cloudflare Proxy Interference

If your domain is behind Cloudflare with the orange cloud (proxy) enabled, Cloudflare intercepts the ACME challenge request. Solutions:

  1. Temporarily disable Cloudflare proxy (grey cloud / DNS-only) while provisioning the certificate
  2. Use DNS-only mode permanently -- Lumos Gate itself is the proxy layer, so Cloudflare proxy is redundant
  3. Set Cloudflare SSL mode to Full (Strict) if you must keep the orange cloud

See DNS Setup for Cloudflare-specific guidance.

Manual Certificate Check

If a certificate was provisioned but seems invalid or expired:

# Check the certificate details
echo | openssl s_client -connect your-domain.com:443 -servername your-domain.com 2>/dev/null | openssl x509 -noout -dates -subject

# Check certificate chain
echo | openssl s_client -connect your-domain.com:443 -servername your-domain.com 2>/dev/null | openssl x509 -noout -issuer

SSL Certificate Expiry Warning

The system checks for certificates expiring within 7 days during the health check cycle and sends an ssl_expiring notification. The agent automatically renews certificates before they expire. If you receive an expiry warning, check the agent logs for renewal errors.


4. Domain Not Working After Adding

After adding a domain in Lumos Gate, it does not resolve or returns errors.

Check DNS Propagation

DNS changes can take time to propagate. Check current DNS records from multiple sources:

# Check current DNS records
dig example.com A +short

# Check from specific DNS servers
dig example.com A @8.8.8.8 +short     # Google DNS
dig example.com A @1.1.1.1 +short     # Cloudflare DNS
dig example.com A @9.9.9.9 +short     # Quad9

# Check TTL (lower TTL = faster propagation)
dig example.com A +noall +answer

If the result does not show your shield VPS IP, DNS has not propagated yet or the records are incorrect.

Tip: Set TTL to 300 (5 minutes) before changing DNS records. This ensures faster propagation. Once everything is working, you can increase the TTL.

Verify DNS Records

Ensure you have the correct A record:

Type    Name              Value              TTL
A       example.com       <SHIELD_VPS_IP>    300
A       www.example.com   <SHIELD_VPS_IP>    300  (if using www)

If you use a CNAME for a subdomain, it must ultimately resolve to the shield VPS IP.

Note: If you are using Cloudflare proxy (orange cloud), the IP returned by dig will be Cloudflare's IP, not your shield IP. For Lumos Gate to work correctly, either disable the Cloudflare proxy (grey cloud / DNS-only) or configure SSL mode to Full (Strict). See DNS Setup.

Check Config Push Reached the Agent

After adding a domain in the dashboard, a config push is sent to the WebSocket server, which forwards it to the agent. Verify the domain was added to HAProxy:

# Check if the domain exists in HAProxy config
grep "example.com" /etc/haproxy/haproxy.cfg

If the domain is not in the config, the config push may have failed. Check the agent logs:

journalctl -u lumos-agent | grep -i "config\|domain\|push\|example.com"

If the agent was offline when the config push was sent, reconnecting will trigger a full config sync. Restart the agent to force a reconnect:

systemctl restart lumos-agent

Test Direct Connection (Bypass DNS)

Bypass DNS and test directly against the shield VPS to isolate whether the issue is DNS or proxy configuration:

# Test HTTP directly against the shield VPS
curl -H "Host: example.com" http://<SHIELD_VPS_IP>/

# Test HTTPS directly (with SNI)
curl -H "Host: example.com" --resolve example.com:443:<SHIELD_VPS_IP> https://example.com/

If this returns your site content, the proxy is working and the issue is DNS-only. If it returns a 503 or connection error, the proxy configuration or origin is the problem.

Check Origin Connectivity

From the shield VPS, verify the origin server is reachable:

# Test origin from the shield VPS
curl -v http://<ORIGIN_IP>:<ORIGIN_PORT>/

If the origin is unreachable from the shield, check:

  • Origin firewall -- ensure the shield VPS IP is whitelisted
  • WireGuard tunnel -- if using encrypted origin, ensure the tunnel is up
  • Origin server is actually running and listening on the expected port

5. WAF Blocking Legitimate Traffic

If real users or legitimate services are being blocked by the WAF.

Check WAF Events Log

Navigate to Dashboard -> WAF and review the blocked requests log. Each entry shows:

  • Source IP address
  • Request path and method
  • Block reason (rate limit, IP blacklist, OWASP pattern, bot detection)
  • Domain
  • Timestamp

Identify the Block Reason and Fix

Block ReasonWhat Triggered ItSolution
Rate limit exceededToo many requests from a single IP in the configured windowIncrease the rate limit threshold for the domain
IP blacklistedThe client IP is in your IP blacklistRemove the IP from the blacklist
OWASP pattern matchRequest matched a SQL injection, XSS, or path traversal patternReview the specific request; if it is a false positive, lower the WAF level from "high" to "medium" or "low"
Bot challenge failedClient did not pass the JavaScript challengeEnsure the client supports JavaScript. API clients and bots will fail JS challenges -- see below

Adjust WAF Level

The WAF level controls sensitivity. If you are getting false positives:

  1. Navigate to Dashboard -> WAF
  2. Find the affected domain
  3. Change the WAF level:
    • High -- Strictest, more false positives possible
    • Medium -- Balanced (recommended for most sites)
    • Low -- Minimal blocking, only obvious attacks

API Clients and Bot Protection

Bot protection uses a JavaScript challenge that requires a browser environment. API clients, webhooks, monitoring services, and legitimate bots (like payment processors or CI/CD systems) will fail the JS challenge.

Solutions:

  • Add trusted IPs to the whitelist so they bypass all WAF rules
  • Disable bot protection for API-only domains
  • Use a separate domain for API endpoints without bot protection enabled

Whitelist Trusted IPs

Add trusted IPs to the whitelist so they bypass WAF rules:

  1. Navigate to Dashboard -> WAF -> IP Management
  2. Add IP addresses or CIDR ranges that should be whitelisted

Warning: Only whitelist IPs you trust, such as your office network, monitoring services, known API clients, or payment processor webhook IPs.

Disable WAF for a Domain

If you need to quickly stop blocking while you investigate:

  1. Navigate to Dashboard -> WAF
  2. Toggle WAF off for the specific domain

WAF is toggled per-domain, so disabling it for one domain does not affect others. Re-enable it once you have adjusted the rules.


6. HAProxy Not Reloading

The agent generates HAProxy configurations and reloads the process. If reloads fail, your latest domain or WAF changes will not take effect.

Check Agent Logs

journalctl -u lumos-agent | grep -i "reload\|haproxy\|error\|rollback"

Common reload errors:

ErrorCauseSolution
Configuration syntax errorGenerated config has an issueAgent auto-rolls back; check logs for the specific syntax error
Port already in useAnother process on port 80 or 443Find and stop the conflicting process (see section 11)
Permission deniedAgent lost root privilegesCheck agent service user configuration
File not foundHAProxy binary missingReinstall HAProxy: apt install -y haproxy

Verify HAProxy Status

# Check HAProxy service status
systemctl status haproxy

# Test the current config for syntax errors
haproxy -c -f /etc/haproxy/haproxy.cfg

Automatic Rollback

The agent implements automatic config rollback:

  1. Current config is backed up in memory
  2. New config is written to /etc/haproxy/haproxy.cfg
  3. HAProxy reload is attempted
  4. If reload fails, the backup is restored and HAProxy is reloaded with the old config
  5. A haproxy_reload_failed error is reported to the dashboard via notification

All config writes and reloads are serialized under a single mutex to prevent race conditions. If you see repeated reload failures, check the agent logs for the specific HAProxy error message.

Manual Config Validation

# Validate the current config
haproxy -c -f /etc/haproxy/haproxy.cfg

# If invalid, check what was written
head -100 /etc/haproxy/haproxy.cfg

Manual Restart (Last Resort)

As a last resort, you can manually restart HAProxy:

systemctl restart haproxy

Warning: Restarting HAProxy (as opposed to reloading) causes a brief interruption in active connections. HAProxy reload is zero-downtime; restart is not. Only restart if reload is not working.


7. HAProxy Health Check Failures

The agent monitors HAProxy health every 10 seconds. If HAProxy crashes, the agent automatically restarts it and sends a haproxy_crash notification.

Check for Repeated Crashes

journalctl -u lumos-agent | grep -i "crash\|restart\|health\|haproxy.*down"
journalctl -u haproxy -n 50 --no-pager

Common Crash Causes

CauseSolution
Out of memoryUpgrade VPS RAM or reduce concurrent connections
Too many open filesCheck ulimit -n; edge-setup should have raised this
Corrupted configAgent will auto-rollback; check logs
HAProxy binary updated externallyAvoid running apt upgrade haproxy independently

Check HAProxy Resource Usage

# Check HAProxy memory usage
ps aux | grep haproxy

# Check open file descriptors
ls /proc/$(pgrep -f "haproxy.*-f")/fd 2>/dev/null | wc -l

# Check connection count
ss -s

8. High Latency Through Proxy

Traffic through the shield VPS has noticeably higher latency than direct connections.

Check Origin Server Response Time

The shield adds a network hop, but most latency usually comes from the origin:

# Measure time through the shield
curl -o /dev/null -s -w "Total: %{time_total}s\nConnect: %{time_connect}s\nTTFB: %{time_starttransfer}s\n" https://example.com

# Measure time direct to origin (from the shield VPS itself)
curl -o /dev/null -s -w "Total: %{time_total}s\nConnect: %{time_connect}s\nTTFB: %{time_starttransfer}s\n" http://<ORIGIN_IP>:<ORIGIN_PORT>

If the origin TTFB is high, the issue is not with the proxy.

Consider VPS Location

The physical distance between user, shield, and origin affects latency:

Good:  User (EU) -> Shield (EU) -> Origin (EU)       ~5ms added
OK:    User (EU) -> Shield (EU) -> Origin (US)        ~100ms added
Bad:   User (EU) -> Shield (US) -> Origin (EU)        ~200ms added (unnecessary round trip)

Place your shield VPS in the same region as the majority of your users, or as close to the origin as possible. See VPS Providers for providers with multiple regions and Multiple Servers for multi-region setups.

WireGuard Overhead

If you are using WireGuard to encrypt traffic between the shield and origin, expect approximately 3-5% overhead due to encryption and encapsulation. This is generally negligible for most workloads.

Check VPS Resources

Ensure your shield VPS has enough resources:

# CPU usage
top -bn1 | head -10

# Memory usage
free -h

# Network throughput
iftop -t -s 5 2>/dev/null || echo "Install iftop: apt install iftop"

# Check if the VPS is swapping (bad for performance)
swapon --show
vmstat 1 5

If the VPS is resource-constrained, consider upgrading the VPS tier or distributing traffic across multiple servers.

Check Kernel Tuning

If you installed with LUMOS_NO_TUNE=1, kernel tuning was skipped. This can cause performance issues under load:

# Check if BBR is active
sysctl net.ipv4.tcp_congestion_control
# Should output: net.ipv4.tcp_congestion_control = bbr

# Check connection tracking limits
sysctl net.netfilter.nf_conntrack_max 2>/dev/null

You can re-run the edge setup script to apply tuning:

curl -fsSL https://get.lumosgate.com/edge-setup.sh | bash

See Supported OS -- Edge Setup for details.


9. Bot Protection Blocking Real Users

The bot protection system uses a JavaScript challenge with HMAC cookie verification. Some legitimate users or clients may fail this challenge.

Who Gets Blocked

  • Users with JavaScript disabled in their browser
  • Very old browsers that do not support modern JS
  • API clients making direct HTTP requests (no browser environment)
  • Automated monitoring tools (Pingdom, UptimeRobot, etc.)
  • Payment processor webhooks (Stripe, PayPal, etc.)
  • Search engine crawlers (though major crawlers are usually whitelisted by user agent)

Solutions

  1. Whitelist known IPs -- Add the IP addresses of your monitoring services, API clients, and webhook sources to the IP whitelist. Whitelisted IPs bypass all WAF and bot protection checks.

  2. Disable bot protection per domain -- If a domain serves primarily API traffic, disable bot protection for that domain. You can still keep WAF rules active.

  3. Separate API and web domains -- Use api.example.com for API traffic (no bot protection) and example.com for web traffic (with bot protection).

Verifying Bot Protection Is the Issue

Check the WAF events log in the dashboard. If the block reason is "Bot challenge failed", bot protection is the cause. The blocked request entry will show the IP and request path.

You can also test from the command line:

# This will fail bot protection (no JS engine)
curl -v https://example.com/

# Check if you get a 403 or a JS challenge page

10. Account Frozen

Your dashboard shows a frozen account banner and you cannot make configuration changes.

Why It Happens

The account is frozen when the automatic billing deduction fails due to insufficient credit balance. The system attempted to deduct your plan's monthly price and your balance was too low.

Your Sites Are Still Online

Existing proxy configurations continue to work. HAProxy on your shield servers keeps running with the last known good configuration. Your sites remain online and accessible. No stop signal is sent to your agents.

How to Unfreeze

  1. Navigate to Dashboard -> Settings -> Billing (you can still access this while frozen)
  2. Click Deposit
  3. Select an amount and complete the USDT payment
  4. Once the payment confirms on-chain, your balance updates
  5. If the new balance >= your plan's monthly price, the account unfreezes automatically
  6. All mutation operations are re-enabled within seconds

Note: Auto-unfreeze happens as soon as the blockchain transaction confirms your deposit. No manual action is needed beyond sending the payment.

Cannot Deposit While Frozen?

If the deposit button does not appear or the billing tab is not loading, try:

  1. Clear your browser cache and reload the dashboard
  2. Try a different browser
  3. Check browser console for JavaScript errors (F12 -> Console)

The deposit endpoint is accessible even while frozen, so it should work. If you still cannot deposit, contact support.

Emergency Domain Changes While Frozen

You can still change origin IP addresses on existing domains while frozen. This is intentionally allowed for emergency situations (for example, if an origin server goes down and you need to redirect traffic). Navigate to the domain detail page and update the origin servers.

See Credits -- Account Freezing and Account -- Frozen Accounts for complete details.


11. HAProxy Won't Start

HAProxy fails to start, blocking all proxy traffic.

Port Conflict

The most common cause is another service occupying ports 80 or 443:

# Find what is using port 80
ss -tlnp | grep :80

# Find what is using port 443
ss -tlnp | grep :443

Common conflicting services:

ServiceHow to Stop
Apache2systemctl stop apache2 && systemctl disable apache2
Nginxsystemctl stop nginx && systemctl disable nginx
Caddysystemctl stop caddy && systemctl disable caddy
Another HAProxypkill haproxy then restart via systemd

Config Syntax Error

# Validate config
haproxy -c -f /etc/haproxy/haproxy.cfg

# The error output will show the exact line and issue

If the config is corrupted, the agent's automatic rollback should have restored the previous working config. If it did not, you can check if a backup exists:

# Look for backup configs
ls -la /etc/haproxy/haproxy.cfg*

Missing HAProxy Binary

which haproxy
haproxy -v

If HAProxy is not installed, install it:

apt update && apt install -y haproxy
systemctl enable haproxy
systemctl restart lumos-agent

Permissions Issue

# Check HAProxy config file permissions
ls -la /etc/haproxy/haproxy.cfg

# Should be readable by haproxy user/group
# Fix if needed
chmod 644 /etc/haproxy/haproxy.cfg
chown root:root /etc/haproxy/haproxy.cfg

12. Config Push Not Working

You make changes in the dashboard (add domain, change WAF rules, etc.) but the changes do not reach the agent.

Verify the Config Push Chain

The config push chain is: Dashboard API -> WebSocket Server -> Agent WebSocket -> HAProxy reload

A failure at any point breaks the chain.

Check Agent Connection

First, verify the agent is connected (appears online in dashboard). If offline, see section 1.

Force a Config Sync

Restart the agent to force a full config sync on reconnect:

systemctl restart lumos-agent

The agent requests the full configuration from the WS server upon every reconnect, so a restart effectively forces a fresh config sync.

Check Agent Logs for Config Updates

journalctl -u lumos-agent | grep -i "config\|push\|update\|received"

If you see "config received" but no HAProxy reload, the issue is in HAProxy config generation or reload. See section 6.


13. Cannot Delete a Domain

You try to delete a domain but get an error.

Account Frozen

If your account is frozen, all mutation operations (including deletion) are blocked. Deposit credits to unfreeze first.

API Error

Check the browser console (F12 -> Network tab) for the specific error response from the API. Common errors:

HTTP StatusMeaningSolution
403Account frozenDeposit credits to unfreeze
404Domain not foundRefresh the page; it may already be deleted
500Server errorTry again; check server logs if it persists

Domain Still in Use

If the domain has active traffic or pending SSL provisioning, the deletion should still work. There is no "in use" block. If deletion fails, try again after a few seconds.


14. Agent Installation Fails

The installation script exits with an error.

Check OS Requirements

The agent requires Debian 12+ or Ubuntu 24.04+:

cat /etc/os-release

If you are running a different distribution, it is not currently supported. See Supported OS.

Check Root Access

The installer must run as root:

whoami
# Should output: root

If not root, use sudo:

curl -fsSL https://get.lumosgate.com/install | LUMOS_TOKEN=YOUR_TOKEN sudo -E bash

Check curl

The installer requires curl:

curl --version

If not installed:

apt update && apt install -y curl

Existing HAProxy Detected

If HAProxy is already installed, the installer shows the existing configuration statistics (number of frontends, backends, lines) and asks for confirmation. To skip the interactive prompt:

curl -fsSL https://get.lumosgate.com/install | LUMOS_TOKEN=YOUR_TOKEN LUMOS_FORCE=1 bash

The LUMOS_FORCE=1 flag bypasses the confirmation prompt. The existing HAProxy configuration is still backed up before any changes are made. After installation, you can import existing sites via Detected Sites.

Network Errors

If the installer cannot download the agent binary:

# Test connectivity to the download server
curl -v https://get.lumosgate.com/

# Check DNS resolution
dig get.lumosgate.com +short

Disk Full

df -h /

If less than 100 MB is free, clear space before installing.

Package Lock (apt)

If another apt process is running:

# Check for running apt processes
ps aux | grep apt

# Wait for it to finish, or if it is stuck:
kill $(cat /var/lib/dpkg/lock-frontend 2>/dev/null) 2>/dev/null
rm -f /var/lib/dpkg/lock-frontend /var/lib/dpkg/lock /var/cache/apt/archives/lock
dpkg --configure -a

15. Agent Update

How to update the Lumos Gate agent to the latest version.

Automatic Updates

The agent does not auto-update. You must manually update when a new version is available.

Update Procedure

Re-run the installation script with the LUMOS_FORCE=1 flag. This downloads the latest binary and restarts the service while preserving your configuration:

curl -fsSL https://get.lumosgate.com/install | LUMOS_TOKEN=YOUR_TOKEN LUMOS_FORCE=1 bash

Note: The LUMOS_FORCE=1 flag is required when the agent is already installed. It skips the existing HAProxy confirmation prompt. Your encrypted agent configuration and HAProxy config are preserved.

Verify the Update

# Check the agent is running
systemctl status lumos-agent

# Check agent logs for the new version
journalctl -u lumos-agent -n 20 --no-pager

16. Connection Drops / Agent Keeps Reconnecting

The agent disconnects and reconnects frequently.

Check VPS Network Stability

# Test network stability with continuous ping
ping -c 100 lumosgate.com

# Check for packet loss
ping -c 50 -q lumosgate.com

If you see packet loss above 1-2%, the VPS network may be unstable. Contact your VPS provider.

Check Agent Reconnection Logs

journalctl -u lumos-agent | grep -i "connect\|disconnect\|reconnect\|backoff"

The agent has built-in automatic reconnection with exponential backoff. Occasional disconnections are normal (network blips, WS server restarts during deployments). Frequent disconnections (more than a few per hour) indicate a persistent network issue.

Aggressive NAT/Firewall Timeout

Some networks drop idle TCP connections. The agent sends periodic heartbeats, but if the timeout is very aggressive (under 60 seconds), connections may still drop. This is common on some budget VPS providers.


17. DNS Failover Not Working

DNS failover is configured but does not trigger when the primary server goes down.

Check Plan

DNS failover requires the Pro or Enterprise plan and at least 2 servers.

Check Health Check Status

The WebSocket server triggers health checks every 5 minutes. Check if health checks are running:

  1. Navigate to Dashboard -> Servers and check server status indicators
  2. Check notifications for server_down alerts

Check DNS Provider Configuration

DNS failover requires a configured DNS provider (Cloudflare). Verify in Dashboard -> Domains -> [domain] -> DNS that the DNS provider is connected.

Timing

Failover is not instant. The health check runs every 5 minutes, so in the worst case it takes up to 5 minutes to detect a failure, plus DNS propagation time (typically 1-5 minutes with low TTL).


18. Detected Sites Not Showing

After installing the agent on a server with existing HAProxy configuration, the Detected Sites page shows no sites.

Agent Must Send Backup Config

The agent sends the existing HAProxy configuration to the WS server upon first connection. The dashboard parses this to find existing sites.

  1. Ensure the agent has connected at least once
  2. Check agent logs for backup config upload: journalctl -u lumos-agent | grep -i "backup\|existing\|config"
  3. If the agent was installed with LUMOS_FORCE=1 on a fresh system (no existing HAProxy config), there are no sites to detect

Already Managed Domains

Sites that you have already added as domains in Lumos Gate are marked as "already managed" and will appear differently in the detected sites list.


Logs and Diagnostics

Agent Logs

The primary diagnostic tool. Most issues are diagnosable from agent logs:

# Recent logs (last 50 lines)
journalctl -u lumos-agent -n 50 --no-pager

# Follow logs in real-time
journalctl -u lumos-agent -f

# Logs from a specific time range
journalctl -u lumos-agent --since "1 hour ago"

# Filter for errors only
journalctl -u lumos-agent -p err --no-pager

HAProxy Logs

# HAProxy service logs
journalctl -u haproxy -n 50 --no-pager

# HAProxy access logs (if configured to syslog)
tail -100 /var/log/haproxy.log 2>/dev/null

HAProxy Config Validation

# Validate current config
haproxy -c -f /etc/haproxy/haproxy.cfg

# Show current config
cat /etc/haproxy/haproxy.cfg

System Diagnostics

# Full system overview
systemctl status lumos-agent
systemctl status haproxy
ss -tlnp | grep -E ':80|:443'
free -h
df -h /
uname -r
cat /etc/os-release

Dashboard Notifications

Error events from the agent are reported to the dashboard via the notification system. Check Dashboard -> Notifications for:

  • server_down -- Agent disconnected
  • server_error -- HAProxy crash, config update failed, reload failed
  • ssl_expiring -- Certificate expiring within 7 days

Collecting Diagnostics for Support

When reporting an issue to support (Pro/Enterprise plans), include the output of these commands:

echo "=== Agent Status ==="
systemctl status lumos-agent

echo "=== Agent Logs (last 100 lines) ==="
journalctl -u lumos-agent -n 100 --no-pager

echo "=== HAProxy Status ==="
systemctl status haproxy

echo "=== HAProxy Config Validation ==="
haproxy -c -f /etc/haproxy/haproxy.cfg

echo "=== HAProxy Version ==="
haproxy -v

echo "=== OS Info ==="
cat /etc/os-release

echo "=== Kernel ==="
uname -r

echo "=== Memory ==="
free -h

echo "=== Disk ==="
df -h /

echo "=== Ports ==="
ss -tlnp | grep -E ':80|:443'

echo "=== Architecture ==="
uname -m

19. Agent Binary Not Found (404)

The agent installer downloads the binary from get.lumosgate.com. If the download returns a 404 error, the binary is not available for your platform.

Causes

  • CDN not configured -- The get.lumosgate.com CDN endpoint has not been set up yet, or the binary has not been published for the current release.
  • Unsupported architecture -- Agent binaries are built for linux-amd64 and linux-arm64 only. Other architectures (e.g., armv7, i386) are not supported.

Check Your Architecture

uname -m
# Expected: x86_64 (amd64) or aarch64 (arm64)

Workaround: Build from Source

If the CDN binary is not available, you can build the agent from source on any machine with Go installed:

cd agent
GOOS=linux GOARCH=amd64 go build -o lumos-agent ./cmd/lumos-agent/

For ARM servers:

GOOS=linux GOARCH=arm64 go build -o lumos-agent ./cmd/lumos-agent/

Then transfer the binary to your VPS and place it at /usr/local/bin/lumos-agent.

See Supported OS for the full list of supported architectures and operating systems.


Quick Reference

SymptomMost Likely CauseFirst Step
Agent offlineService stopped or token invalidsystemctl status lumos-agent
SSL stuck provisioningDNS not pointing to shield or port 80 blockeddig example.com +short
Domain not workingDNS not propagated or config push faileddig example.com @8.8.8.8 +short
WAF blocking usersRate limit too low or false positiveCheck WAF events in dashboard
HAProxy not reloadingConfig syntax errorhaproxy -c -f /etc/haproxy/haproxy.cfg
High latencyOrigin slow or VPS too far from usersTest origin directly from shield
Account frozenInsufficient credit balanceDeposit via Settings -> Billing
Bot protection blockingAPI client or old browserWhitelist the IP address
Installation failsWrong OS or not rootcat /etc/os-release && whoami
Config changes not applyingAgent offline or server issueRestart agent to force sync
Port 80/443 in useApache/Nginx still running`ss -tlnp

Next Steps

On this page

Troubleshooting
1. Agent Shows Offline in Dashboard
Check Agent Service Status
Verify the Connection Token
Check Outbound Connectivity
Check for Multiple Agent Instances
Check DNS Resolution from the VPS
Agent Crashed or OOM Killed
2. Agent Won't Connect (WebSocket Issues)
Check the WS Server URL
Firewall or NAT Issues
Proxy or Content Filter
WS Server Down
3. SSL Certificate Not Provisioning
DNS Must Point to the Shield VPS
Port 80 Must Be Open
Check Agent Logs for ACME Errors
Cloudflare Proxy Interference
Manual Certificate Check
SSL Certificate Expiry Warning
4. Domain Not Working After Adding
Check DNS Propagation
Verify DNS Records
Check Config Push Reached the Agent
Test Direct Connection (Bypass DNS)
Check Origin Connectivity
5. WAF Blocking Legitimate Traffic
Check WAF Events Log
Identify the Block Reason and Fix
Adjust WAF Level
API Clients and Bot Protection
Whitelist Trusted IPs
Disable WAF for a Domain
6. HAProxy Not Reloading
Check Agent Logs
Verify HAProxy Status
Automatic Rollback
Manual Config Validation
Manual Restart (Last Resort)
7. HAProxy Health Check Failures
Check for Repeated Crashes
Common Crash Causes
Check HAProxy Resource Usage
8. High Latency Through Proxy
Check Origin Server Response Time
Consider VPS Location
WireGuard Overhead
Check VPS Resources
Check Kernel Tuning
9. Bot Protection Blocking Real Users
Who Gets Blocked
Solutions
Verifying Bot Protection Is the Issue
10. Account Frozen
Why It Happens
Your Sites Are Still Online
How to Unfreeze
Cannot Deposit While Frozen?
Emergency Domain Changes While Frozen
11. HAProxy Won't Start
Port Conflict
Config Syntax Error
Missing HAProxy Binary
Permissions Issue
12. Config Push Not Working
Verify the Config Push Chain
Check Agent Connection
Force a Config Sync
Check Agent Logs for Config Updates
13. Cannot Delete a Domain
Account Frozen
API Error
Domain Still in Use
14. Agent Installation Fails
Check OS Requirements
Check Root Access
Check curl
Existing HAProxy Detected
Network Errors
Disk Full
Package Lock (apt)
15. Agent Update
Automatic Updates
Update Procedure
Verify the Update
16. Connection Drops / Agent Keeps Reconnecting
Check VPS Network Stability
Check Agent Reconnection Logs
Aggressive NAT/Firewall Timeout
17. DNS Failover Not Working
Check Plan
Check Health Check Status
Check DNS Provider Configuration
Timing
18. Detected Sites Not Showing
Agent Must Send Backup Config
Already Managed Domains
Logs and Diagnostics
Agent Logs
HAProxy Logs
HAProxy Config Validation
System Diagnostics
Dashboard Notifications
Collecting Diagnostics for Support
19. Agent Binary Not Found (404)
Causes
Check Your Architecture
Workaround: Build from Source
Quick Reference
Next Steps