DevOps & Scaling

High Availability CAPTCHA Solving: Failover and Redundancy

When your revenue depends on automated data collection, a CAPTCHA solving outage means lost data and broken SLAs. High availability (HA) design ensures your pipeline keeps running through partial failures — worker crashes, network issues, or API hiccups.

HA Components

[Health Checker] ──── monitors ────→ [Worker Pool A]
       ↓                                    ↓ (primary)
[Circuit Breaker]                    [CaptchaAI API]
       ↓                                    ↑ (fallback)
[Failover Router] ── redirects ──→ [Worker Pool B]
       ↓
[Dead Letter Queue] ← unrecoverable failures

Layer 1: Worker Redundancy

Run more workers than you need. When one fails, the remaining workers absorb the load.

Python — Supervised Worker Pool

import os
import time
import threading
import queue
import requests

API_KEY = os.environ["CAPTCHAAI_API_KEY"]
task_queue = queue.Queue(maxsize=200)
results = {}


class SupervisedWorkerPool:
    def __init__(self, worker_count, min_workers=2):
        self.worker_count = worker_count
        self.min_workers = min_workers
        self.workers = {}
        self.lock = threading.Lock()

    def start(self):
        """Launch workers and supervisor."""
        for i in range(self.worker_count):
            self._launch_worker(i)

        # Supervisor thread monitors worker health
        supervisor = threading.Thread(target=self._supervise, daemon=True)
        supervisor.start()

    def _launch_worker(self, worker_id):
        t = threading.Thread(
            target=self._worker_loop,
            args=(worker_id,),
            daemon=True
        )
        t.start()
        with self.lock:
            self.workers[worker_id] = {
                "thread": t,
                "alive": True,
                "last_heartbeat": time.time(),
                "tasks_completed": 0
            }

    def _worker_loop(self, worker_id):
        session = requests.Session()
        while True:
            try:
                task = task_queue.get(timeout=30)
                result = solve_captcha(session, task)
                results[task["task_id"]] = result

                with self.lock:
                    self.workers[worker_id]["last_heartbeat"] = time.time()
                    self.workers[worker_id]["tasks_completed"] += 1

                task_queue.task_done()
            except queue.Empty:
                # Heartbeat even when idle
                with self.lock:
                    self.workers[worker_id]["last_heartbeat"] = time.time()
            except Exception as e:
                print(f"Worker {worker_id} error: {e}")
                with self.lock:
                    self.workers[worker_id]["last_heartbeat"] = time.time()

    def _supervise(self):
        """Restart dead workers."""
        while True:
            time.sleep(15)
            with self.lock:
                now = time.time()
                for wid, info in list(self.workers.items()):
                    if not info["thread"].is_alive():
                        print(f"Worker {wid} died — restarting")
                        self._launch_worker(wid)
                    elif now - info["last_heartbeat"] > 120:
                        print(f"Worker {wid} stalled — replacing")
                        self._launch_worker(wid)

    @property
    def status(self):
        with self.lock:
            alive = sum(1 for w in self.workers.values()
                       if w["thread"].is_alive())
            return {
                "alive": alive,
                "total": len(self.workers),
                "healthy": alive >= self.min_workers
            }


def solve_captcha(session, task):
    resp = session.post("https://ocr.captchaai.com/in.php", data={
        "key": API_KEY,
        "method": task.get("method", "userrecaptcha"),
        "googlekey": task["sitekey"],
        "pageurl": task["pageurl"],
        "json": 1
    })
    data = resp.json()

    if data.get("status") != 1:
        return {"error": data.get("request")}

    captcha_id = data["request"]
    for _ in range(60):
        time.sleep(5)
        result = session.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": captcha_id, "json": 1
        }).json()
        if result.get("status") == 1:
            return {"solution": result["request"]}
        if result.get("request") != "CAPCHA_NOT_READY":
            return {"error": result.get("request")}
    return {"error": "TIMEOUT"}


# Start pool with 8 workers, minimum 3 healthy
pool = SupervisedWorkerPool(worker_count=8, min_workers=3)
pool.start()

Layer 2: Circuit Breaker

Detect when CaptchaAI is having issues and stop sending requests to avoid wasting balance on timeouts:

JavaScript

class CircuitBreaker {
  constructor(options = {}) {
    this.failureThreshold = options.failureThreshold || 5;
    this.resetTimeout = options.resetTimeout || 60000; // 1 minute
    this.failures = 0;
    this.lastFailure = 0;
    this.state = "closed"; // closed, open, half-open
    this.successesInHalfOpen = 0;
  }

  async execute(fn) {
    if (this.state === "open") {
      if (Date.now() - this.lastFailure > this.resetTimeout) {
        this.state = "half-open";
        this.successesInHalfOpen = 0;
      } else {
        throw new Error("Circuit breaker is OPEN — requests blocked");
      }
    }

    try {
      const result = await fn();
      this._onSuccess();
      return result;
    } catch (err) {
      this._onFailure();
      throw err;
    }
  }

  _onSuccess() {
    if (this.state === "half-open") {
      this.successesInHalfOpen++;
      if (this.successesInHalfOpen >= 3) {
        this.state = "closed";
        this.failures = 0;
        console.log("Circuit breaker CLOSED — service recovered");
      }
    } else {
      this.failures = 0;
    }
  }

  _onFailure() {
    this.failures++;
    this.lastFailure = Date.now();
    if (this.failures >= this.failureThreshold) {
      this.state = "open";
      console.log(
        `Circuit breaker OPEN — ${this.failures} consecutive failures`
      );
    }
  }
}

// Usage
const breaker = new CircuitBreaker({
  failureThreshold: 5,
  resetTimeout: 60000,
});

async function solveCaptchaWithBreaker(sitekey, pageurl) {
  return breaker.execute(() => solveCaptcha(sitekey, pageurl));
}

Layer 3: Health Check Endpoint

Expose health status for load balancers and monitoring:

Python (Flask)

from flask import Flask, jsonify

app = Flask(__name__)


@app.route("/health")
def health_check():
    pool_status = pool.status
    queue_depth = task_queue.qsize()

    health = {
        "status": "healthy" if pool_status["healthy"] else "degraded",
        "workers_alive": pool_status["alive"],
        "workers_total": pool_status["total"],
        "queue_depth": queue_depth,
        "queue_capacity": task_queue.maxsize
    }

    code = 200 if health["status"] == "healthy" else 503
    return jsonify(health), code


@app.route("/health/ready")
def readiness_check():
    """Readiness probe — is this instance ready to receive tasks?"""
    if pool.status["alive"] > 0 and task_queue.qsize() < task_queue.maxsize:
        return "ready", 200
    return "not ready", 503

Layer 4: Graceful Degradation

When things go wrong, degrade gracefully instead of failing completely:

class GracefulDegradation:
    def __init__(self):
        self.mode = "normal"  # normal, degraded, emergency

    def set_mode(self, error_rate, queue_depth, workers_alive):
        if workers_alive == 0 or error_rate > 0.5:
            self.mode = "emergency"
        elif error_rate > 0.2 or queue_depth > 150:
            self.mode = "degraded"
        else:
            self.mode = "normal"

    def should_accept_task(self, priority):
        if self.mode == "normal":
            return True
        if self.mode == "degraded":
            return priority in ("high", "critical")
        return priority == "critical"  # Emergency: critical only

    @property
    def status(self):
        return {
            "mode": self.mode,
            "accepting": {
                "normal": self.mode == "normal",
                "degraded": self.mode in ("normal", "degraded"),
                "emergency": True
            }
        }

HA Checklist

Component Implemented? Notes
Multiple workers (N+1) At least 1 spare worker
Worker health monitoring Supervisor thread or process manager
Automatic worker restart On crash or stall
Circuit breaker Stop requests during API issues
Health check endpoint For load balancers
Graceful degradation Priority-based task acceptance
Dead-letter queue For unrecoverable failures
Fallback polling When callbacks fail
Alerting PagerDuty, Slack, email

Troubleshooting

Issue Cause Fix
All workers crash simultaneously Shared dependency failure (DNS, network) Add retry with backoff; check infrastructure health
Circuit breaker stays open Reset timeout too long, or issue persists Reduce reset timeout; investigate root cause
Health check passes but tasks fail Health check is too simple Check actual solve success in health endpoint
Failover flapping Unstable network causing rapid healthy/unhealthy switches Add hysteresis (require N consecutive failures before failover)

FAQ

What's the minimum HA setup?

Two workers with a supervisor process, a circuit breaker, and basic health monitoring. This handles single-worker failures and API hiccups.

Should I have a secondary CAPTCHA provider as failover?

For critical systems, yes. If CaptchaAI is unreachable, route to a backup provider. CaptchaAI's API is compatible with common formats, making dual-provider setup straightforward.

How do I test HA without causing real outages?

Kill individual worker processes during load tests. Simulate network failures with tc netem (Linux) or add artificial delays. Use chaos engineering tools for automated failure injection.

Next Steps

Build resilient CAPTCHA solving — get your CaptchaAI API key and implement HA from the start.

Related guides:

Discussions (0)

No comments yet.

Related Posts

DevOps & Scaling Ansible Playbooks for CaptchaAI Worker Deployment
Deploy and manage Captcha AI workers with Ansible — playbooks for provisioning, configuration, rolling updates, and health checks across your server fleet.

Deploy and manage Captcha AI workers with Ansible — playbooks for provisioning, configuration, rolling updates...

Automation Python All CAPTCHA Types
Apr 07, 2026
DevOps & Scaling Blue-Green Deployment for CAPTCHA Solving Infrastructure
Implement blue-green deployments for CAPTCHA solving infrastructure — zero-downtime upgrades, traffic switching, and rollback strategies with Captcha AI.

Implement blue-green deployments for CAPTCHA solving infrastructure — zero-downtime upgrades, traffic switchin...

Automation Python All CAPTCHA Types
Apr 07, 2026
DevOps & Scaling Auto-Scaling CAPTCHA Solving Workers
Build auto-scaling CAPTCHA solving workers that adjust capacity based on queue depth, balance, and solve rates.

Build auto-scaling CAPTCHA solving workers that adjust capacity based on queue depth, balance, and solve rates...

Automation Python All CAPTCHA Types
Mar 23, 2026
DevOps & Scaling CaptchaAI Monitoring with Datadog: Metrics and Alerts
Monitor Captcha AI performance with Datadog — custom metrics, dashboards, anomaly detection alerts, and solve rate tracking for CAPTCHA solving pipelines.

Monitor Captcha AI performance with Datadog — custom metrics, dashboards, anomaly detection alerts, and solve...

Automation Python All CAPTCHA Types
Feb 19, 2026
DevOps & Scaling Rolling Updates for CAPTCHA Solving Worker Fleets
Implement rolling updates for CAPTCHA solving worker fleets — zero-downtime upgrades, graceful draining, health-gated progression, and automatic rollback.

Implement rolling updates for CAPTCHA solving worker fleets — zero-downtime upgrades, graceful draining, healt...

Automation Python All CAPTCHA Types
Feb 28, 2026
DevOps & Scaling OpenTelemetry Tracing for CAPTCHA Solving Pipelines
Instrument CAPTCHA solving pipelines with Open Telemetry — distributed traces, spans for submit/poll phases, and vendor-neutral observability with Captcha AI.

Instrument CAPTCHA solving pipelines with Open Telemetry — distributed traces, spans for submit/poll phases, a...

Automation Python All CAPTCHA Types
Mar 07, 2026
DevOps & Scaling CaptchaAI Behind a Load Balancer: Architecture Patterns
Architect CAPTCHA solving workers behind a load balancer — routing strategies, health checks, sticky sessions, and scaling patterns with Captcha AI.

Architect CAPTCHA solving workers behind a load balancer — routing strategies, health checks, sticky sessions,...

Automation Python All CAPTCHA Types
Feb 24, 2026
DevOps & Scaling CaptchaAI Monitoring with New Relic: APM Integration
Integrate Captcha AI with New Relic APM — custom events, transaction tracing, dashboards, and alert policies for CAPTCHA solving performance.

Integrate Captcha AI with New Relic APM — custom events, transaction tracing, dashboards, and alert policies f...

Automation Python All CAPTCHA Types
Jan 31, 2026
DevOps & Scaling Building Custom CaptchaAI Alerts with PagerDuty
Integrate Captcha AI with Pager Duty for incident management — trigger alerts on low balance, high error rates, and pipeline failures with escalation policies.

Integrate Captcha AI with Pager Duty for incident management — trigger alerts on low balance, high error rates...

Automation Python All CAPTCHA Types
Jan 15, 2026
DevOps & Scaling GitHub Actions + CaptchaAI: CI/CD CAPTCHA Testing
Integrate Captcha AI with Git Hub Actions for automated CAPTCHA testing in CI/CD pipelines.

Integrate Captcha AI with Git Hub Actions for automated CAPTCHA testing in CI/CD pipelines. Test flows, verify...

Python reCAPTCHA v2 Testing
Feb 04, 2026
DevOps & Scaling Docker + CaptchaAI: Containerized CAPTCHA Solving
Run Captcha AI integrations in Docker containers.

Run Captcha AI integrations in Docker containers. Dockerfile, environment variables, multi-stage builds, and D...

Automation Python All CAPTCHA Types
Mar 09, 2026
DevOps & Scaling AWS Lambda + CaptchaAI: Serverless CAPTCHA Solving
Integrate Captcha AI with AWS Lambda for serverless CAPTCHA solving.

Integrate Captcha AI with AWS Lambda for serverless CAPTCHA solving. Deploy functions, manage API keys with Se...

Automation Python All CAPTCHA Types
Feb 17, 2026