When your scraping infrastructure sends thousands of CAPTCHA solve requests, a single worker process bottlenecks. A load balancer distributes requests across multiple workers — improving throughput, enabling failover, and letting you scale horizontally.
Architecture Overview
[Scraper 1] ──┐ ┌── [Worker 1] ──→ CaptchaAI API
[Scraper 2] ──┤── [Load Balancer] ──┤── [Worker 2] ──→ CaptchaAI API
[Scraper 3] ──┘ └── [Worker 3] ──→ CaptchaAI API
NGINX Configuration
Round-Robin (Default)
upstream captcha_workers {
server 10.0.1.10:8080;
server 10.0.1.11:8080;
server 10.0.1.12:8080;
}
server {
listen 80;
server_name captcha.internal;
location /solve {
proxy_pass http://captcha_workers;
proxy_set_header X-Real-IP $remote_addr;
proxy_connect_timeout 10s;
proxy_read_timeout 300s; # CAPTCHA solving can take minutes
}
location /health {
proxy_pass http://captcha_workers;
proxy_connect_timeout 5s;
proxy_read_timeout 5s;
}
}
Least Connections (Better for CAPTCHA Solving)
upstream captcha_workers {
least_conn; # Route to worker with fewest active connections
server 10.0.1.10:8080;
server 10.0.1.11:8080;
server 10.0.1.12:8080 weight=2; # Higher capacity worker
# Health checks
server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.12:8080 max_fails=3 fail_timeout=30s;
}
With Backup Workers
upstream captcha_workers {
least_conn;
server 10.0.1.10:8080;
server 10.0.1.11:8080;
server 10.0.1.12:8080 backup; # Only used when others are down
}
Worker API Server
Python (Flask)
import os
import time
import threading
import requests
from flask import Flask, request, jsonify
API_KEY = os.environ["CAPTCHAAI_API_KEY"]
app = Flask(__name__)
# Track active tasks for load reporting
active_tasks = 0
tasks_lock = threading.Lock()
max_concurrent = int(os.environ.get("MAX_CONCURRENT", "20"))
@app.route("/solve", methods=["POST"])
def solve():
global active_tasks
with tasks_lock:
if active_tasks >= max_concurrent:
return jsonify({"error": "WORKER_AT_CAPACITY"}), 503
active_tasks += 1
try:
data = request.json
result = solve_captcha(data)
return jsonify(result)
finally:
with tasks_lock:
active_tasks -= 1
@app.route("/health")
def health():
with tasks_lock:
load = active_tasks / max_concurrent
return jsonify({
"status": "healthy" if load < 0.9 else "overloaded",
"active_tasks": active_tasks,
"max_concurrent": max_concurrent,
"load_pct": round(load * 100, 1)
}), 200 if load < 0.9 else 503
def solve_captcha(data):
session = requests.Session()
payload = {
"key": API_KEY,
"method": data.get("method", "userrecaptcha"),
"googlekey": data.get("sitekey"),
"pageurl": data.get("pageurl"),
"json": 1
}
if data.get("proxy"):
payload["proxy"] = data["proxy"]
payload["proxytype"] = data.get("proxytype", "HTTP")
resp = session.post("https://ocr.captchaai.com/in.php", data=payload)
result = resp.json()
if result.get("status") != 1:
return {"error": result.get("request")}
captcha_id = result["request"]
for _ in range(60):
time.sleep(5)
poll = session.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get", "id": captcha_id, "json": 1
}).json()
if poll.get("status") == 1:
return {"solution": poll["request"], "captcha_id": captcha_id}
if poll.get("request") != "CAPCHA_NOT_READY":
return {"error": poll.get("request")}
return {"error": "TIMEOUT"}
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8080, threaded=True)
JavaScript (Express)
const express = require("express");
const axios = require("axios");
const API_KEY = process.env.CAPTCHAAI_API_KEY;
const MAX_CONCURRENT = parseInt(process.env.MAX_CONCURRENT || "20", 10);
const PORT = parseInt(process.env.PORT || "8080", 10);
let activeTasks = 0;
const app = express();
app.use(express.json());
app.post("/solve", async (req, res) => {
if (activeTasks >= MAX_CONCURRENT) {
return res.status(503).json({ error: "WORKER_AT_CAPACITY" });
}
activeTasks++;
try {
const result = await solveCaptcha(req.body);
res.json(result);
} catch (err) {
res.status(500).json({ error: err.message });
} finally {
activeTasks--;
}
});
app.get("/health", (req, res) => {
const load = activeTasks / MAX_CONCURRENT;
const status = load < 0.9 ? "healthy" : "overloaded";
res
.status(load < 0.9 ? 200 : 503)
.json({ status, activeTasks, maxConcurrent: MAX_CONCURRENT, loadPct: Math.round(load * 100) });
});
async function solveCaptcha(data) {
const submitResp = await axios.post("https://ocr.captchaai.com/in.php", null, {
params: {
key: API_KEY,
method: data.method || "userrecaptcha",
googlekey: data.sitekey,
pageurl: data.pageurl,
json: 1,
},
});
if (submitResp.data.status !== 1) {
return { error: submitResp.data.request };
}
const captchaId = submitResp.data.request;
for (let i = 0; i < 60; i++) {
await new Promise((r) => setTimeout(r, 5000));
const pollResp = await axios.get("https://ocr.captchaai.com/res.php", {
params: { key: API_KEY, action: "get", id: captchaId, json: 1 },
});
if (pollResp.data.status === 1) {
return { solution: pollResp.data.request, captchaId };
}
if (pollResp.data.request !== "CAPCHA_NOT_READY") {
return { error: pollResp.data.request };
}
}
return { error: "TIMEOUT" };
}
app.listen(PORT, () => console.log(`Worker listening on port ${PORT}`));
Routing Strategies Compared
| Strategy | How It Works | Best For |
|---|---|---|
| Round-Robin | Sequential rotation | Equal-capacity workers |
| Least Connections | Route to least loaded | CAPTCHA solving (variable task duration) |
| Weighted | Proportional to weight | Mixed-capacity workers |
| IP Hash | Same client → same worker | Session affinity needed |
| Random | Random selection | Simple, evenly distributed load |
Recommendation: Use least connections for CAPTCHA solving. Task durations vary (5s–120s), so round-robin creates uneven load.
Client-Side Load Balancing
When you can't use an external load balancer, implement routing in the client:
import random
import requests
class ClientLoadBalancer:
def __init__(self, workers):
self.workers = [
{"url": url, "healthy": True, "active": 0}
for url in workers
]
def get_worker(self):
healthy = [w for w in self.workers if w["healthy"]]
if not healthy:
raise Exception("No healthy workers")
return min(healthy, key=lambda w: w["active"])
def solve(self, task):
worker = self.get_worker()
worker["active"] += 1
try:
resp = requests.post(
f"{worker['url']}/solve",
json=task,
timeout=300
)
if resp.status_code == 503:
worker["healthy"] = False
return self.solve(task) # Retry on another worker
return resp.json()
except requests.RequestException:
worker["healthy"] = False
return self.solve(task)
finally:
worker["active"] -= 1
lb = ClientLoadBalancer([
"http://10.0.1.10:8080",
"http://10.0.1.11:8080",
"http://10.0.1.12:8080"
])
result = lb.solve({"sitekey": "6Le-wvkS...", "pageurl": "https://example.com"})
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| 502 Bad Gateway | Worker crashed or not started | Check worker logs; verify port binding |
| Uneven load distribution | Round-robin with variable task durations | Switch to least connections |
| Health check false positive | Check passes but worker is at capacity | Include load percentage in health response |
| Connection timeout | proxy_read_timeout too short |
Set to 300s+ for CAPTCHA solving |
FAQ
Do I need a load balancer for 2-3 workers?
Client-side load balancing works fine for small setups. Use a dedicated load balancer (NGINX, HAProxy) when you have 5+ workers or need features like SSL termination and health checks.
Should I use sticky sessions?
No. CAPTCHA solve requests are stateless — any worker can handle any task. Sticky sessions would create uneven load distribution.
How do I handle workers in different regions?
Use a global load balancer (AWS Global Accelerator, Cloudflare Load Balancing) that routes to the nearest healthy region. Each region runs its own local load balancer for the workers in that region.
Related Articles
- Captchaai Callback Error Handling Patterns
- Nodejs Puppeteer Captchaai Advanced Patterns
- Captcha Solving Architecture Patterns High Volume
Next Steps
Scale your CAPTCHA solving throughput — get your CaptchaAI API key and deploy behind a load balancer.
Related guides:
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.