Vertical scaling (bigger server) hits a ceiling. Horizontal scaling (more workers) lets your CAPTCHA solving capacity grow linearly with demand. The key is knowing when to scale and automating the process.
When to Scale Horizontally
| Signal | Threshold | Action |
|---|---|---|
| Queue depth growing | > 50 pending tasks | Add workers |
| Average solve latency | > 45 seconds | Add workers (API isn't the bottleneck) |
| Worker CPU usage | > 70% sustained | Add workers |
| Error rate | > 5% with ERROR_NO_SLOT_AVAILABLE |
Too many concurrent tasks per worker |
| Queue draining slowly | < 80% throughput target | Add workers or increase concurrency |
Scaling Architecture
[Queue Monitor] ──watches──→ [Task Queue]
│ ↕
│ scale signal [Worker 1]
↓ [Worker 2]
[Auto Scaler] ──adds──→ [Worker 3]
│ [Worker N...]
↓
[Cost Manager] ──caps──→ max workers
Python — Auto-Scaling Controller
import os
import time
import math
import threading
import subprocess
import requests
API_KEY = os.environ["CAPTCHAAI_API_KEY"]
class ScalingMetrics:
"""Collect metrics that drive scaling decisions."""
def __init__(self):
self.queue_depth = 0
self.active_workers = 0
self.tasks_per_minute = 0
self.avg_solve_time = 30 # seconds
self.error_rate = 0.0
self.lock = threading.Lock()
def update(self, queue_depth, active_workers, tasks_per_minute,
avg_solve_time, error_rate):
with self.lock:
self.queue_depth = queue_depth
self.active_workers = active_workers
self.tasks_per_minute = tasks_per_minute
self.avg_solve_time = avg_solve_time
self.error_rate = error_rate
@property
def snapshot(self):
with self.lock:
return {
"queue_depth": self.queue_depth,
"active_workers": self.active_workers,
"tasks_per_minute": self.tasks_per_minute,
"avg_solve_time": self.avg_solve_time,
"error_rate": self.error_rate,
}
class HorizontalAutoScaler:
def __init__(self, min_workers=2, max_workers=20,
tasks_per_worker=10, cooldown=120):
self.min_workers = min_workers
self.max_workers = max_workers
self.tasks_per_worker = tasks_per_worker
self.cooldown = cooldown
self.current_workers = min_workers
self.last_scale_time = 0
self.metrics = ScalingMetrics()
def calculate_desired_workers(self):
snapshot = self.metrics.snapshot
# Method 1: Queue-based scaling
queue_based = math.ceil(
snapshot["queue_depth"] / self.tasks_per_worker
)
# Method 2: Throughput-based scaling
if snapshot["tasks_per_minute"] > 0 and snapshot["queue_depth"] > 0:
drain_time = snapshot["queue_depth"] / snapshot["tasks_per_minute"]
if drain_time > 5: # More than 5 minutes to drain
throughput_based = self.current_workers + 2
else:
throughput_based = self.current_workers
else:
throughput_based = self.current_workers
# Method 3: Error-rate scaling (reduce if errors are high)
if snapshot["error_rate"] > 0.1:
error_based = max(
self.min_workers,
self.current_workers - 1
)
else:
error_based = self.current_workers
# Take the maximum of queue and throughput based, limited by error
desired = max(queue_based, throughput_based)
if snapshot["error_rate"] > 0.1:
desired = min(desired, error_based)
# Clamp to bounds
return max(self.min_workers, min(self.max_workers, desired))
def should_scale(self, desired):
if desired == self.current_workers:
return False
if time.time() - self.last_scale_time < self.cooldown:
return False
return True
def scale(self, desired):
if not self.should_scale(desired):
return
direction = "up" if desired > self.current_workers else "down"
diff = abs(desired - self.current_workers)
print(f"Scaling {direction}: {self.current_workers} → {desired} "
f"(+{diff if direction == 'up' else -diff})")
if direction == "up":
self._add_workers(diff)
else:
self._remove_workers(diff)
self.current_workers = desired
self.last_scale_time = time.time()
def _add_workers(self, count):
"""Launch new worker containers."""
for i in range(count):
worker_id = f"captcha-worker-{self.current_workers + i}"
# In production: use Docker API, K8s API, or cloud SDK
print(f" Launching {worker_id}")
def _remove_workers(self, count):
"""Drain and stop workers."""
for i in range(count):
worker_id = f"captcha-worker-{self.current_workers - 1 - i}"
print(f" Draining and removing {worker_id}")
def run_loop(self, interval=30):
"""Main auto-scaling loop."""
print(f"Auto-scaler started: min={self.min_workers}, "
f"max={self.max_workers}")
while True:
desired = self.calculate_desired_workers()
self.scale(desired)
snapshot = self.metrics.snapshot
print(f" Workers: {self.current_workers}, "
f"Queue: {snapshot['queue_depth']}, "
f"TPM: {snapshot['tasks_per_minute']}, "
f"Errors: {snapshot['error_rate']:.1%}")
time.sleep(interval)
# Start auto-scaler
scaler = HorizontalAutoScaler(
min_workers=2,
max_workers=20,
tasks_per_worker=10,
cooldown=120 # 2-minute cooldown between scaling
)
# Run in background
scaling_thread = threading.Thread(target=scaler.run_loop, daemon=True)
scaling_thread.start()
JavaScript — Docker-Based Horizontal Scaling
const { exec } = require("child_process");
const { promisify } = require("util");
const execAsync = promisify(exec);
class DockerHorizontalScaler {
constructor(options = {}) {
this.serviceName = options.serviceName || "captcha-worker";
this.minReplicas = options.minReplicas || 2;
this.maxReplicas = options.maxReplicas || 15;
this.currentReplicas = this.minReplicas;
this.scaleUpThreshold = options.scaleUpThreshold || 50;
this.scaleDownThreshold = options.scaleDownThreshold || 10;
this.cooldownMs = options.cooldownMs || 120000;
this.lastScaleTime = 0;
}
async evaluate(metrics) {
const now = Date.now();
if (now - this.lastScaleTime < this.cooldownMs) {
return { action: "cooldown", current: this.currentReplicas };
}
let desired = this.currentReplicas;
// Scale up: queue growing
if (metrics.queueDepth > this.scaleUpThreshold) {
const needed = Math.ceil(metrics.queueDepth / 10);
desired = Math.min(this.maxReplicas, Math.max(desired, needed));
}
// Scale down: queue mostly empty
if (
metrics.queueDepth < this.scaleDownThreshold &&
this.currentReplicas > this.minReplicas
) {
desired = Math.max(this.minReplicas, this.currentReplicas - 1);
}
if (desired !== this.currentReplicas) {
await this.scaleTo(desired);
return { action: "scaled", from: this.currentReplicas, to: desired };
}
return { action: "no_change", current: this.currentReplicas };
}
async scaleTo(replicas) {
const clamped = Math.max(
this.minReplicas,
Math.min(this.maxReplicas, replicas)
);
console.log(`Scaling ${this.serviceName}: ${this.currentReplicas} → ${clamped}`);
try {
// Docker Compose scaling
await execAsync(
`docker compose up -d --scale ${this.serviceName}=${clamped} --no-recreate`
);
this.currentReplicas = clamped;
this.lastScaleTime = Date.now();
} catch (err) {
console.error(`Scale failed: ${err.message}`);
}
}
status() {
return {
service: this.serviceName,
current: this.currentReplicas,
min: this.minReplicas,
max: this.maxReplicas,
lastScale: new Date(this.lastScaleTime).toISOString(),
};
}
}
// Monitor loop
const scaler = new DockerHorizontalScaler({
serviceName: "captcha-worker",
minReplicas: 2,
maxReplicas: 15,
cooldownMs: 120000,
});
async function monitorAndScale() {
// In production, fetch from your queue/monitoring system
const metrics = {
queueDepth: 75, // Example
errorRate: 0.02,
avgSolveTime: 25,
};
const result = await scaler.evaluate(metrics);
console.log("Scale decision:", result);
console.log("Status:", scaler.status());
}
setInterval(monitorAndScale, 30000);
Cost-Aware Scaling
class CostAwareScaler(HorizontalAutoScaler):
def __init__(self, hourly_cost_per_worker=0.05, budget_per_hour=2.0,
**kwargs):
super().__init__(**kwargs)
self.hourly_cost = hourly_cost_per_worker
self.budget = budget_per_hour
def calculate_desired_workers(self):
desired = super().calculate_desired_workers()
# Cap by budget
max_affordable = int(self.budget / self.hourly_cost)
if desired > max_affordable:
print(f" Budget cap: wanted {desired}, "
f"can afford {max_affordable}")
desired = max_affordable
return desired
Scaling Checklist
| Area | Consider |
|---|---|
| Queue | Persistent queue (Redis, SQS) — not in-memory |
| Workers | Stateless — any worker handles any task |
| Health checks | Load balancer knows which workers are healthy |
| Drain | Workers finish in-flight tasks before shutdown |
| Monitoring | Queue depth, latency, error rate visible |
| Cost | Budget caps prevent runaway scaling |
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| Scaling oscillation | Threshold too close to current load | Add hysteresis: scale-up at 50, scale-down at 10 |
| New workers not helping | CaptchaAI API is the bottleneck | Check API rate limits; optimize concurrency per worker |
| Workers idle after scale-up | Queue drained before new workers ready | Reduce cooldown; scale in smaller increments |
| Cost spike | No max_workers cap | Always set max_workers; add budget limits |
FAQ
When should I scale vertically instead?
Scale vertically (bigger server) when the bottleneck is per-process (CPU-bound image preprocessing, memory for browser instances). Scale horizontally when the bottleneck is throughput (more tasks than one process can handle).
How many workers do I need?
Rough formula: workers = peak_tasks_per_minute × avg_solve_time_seconds / 60 / tasks_per_worker. For 100 tasks/minute at 30s solve time with 10 tasks per worker: 100 × 30 / 60 / 10 = 5 workers.
Should I use Kubernetes for auto-scaling?
Kubernetes HPA (Horizontal Pod Autoscaler) works well for CAPTCHA workers. Use custom metrics (queue depth) rather than CPU-based scaling for more responsive auto-scaling.
Next Steps
Scale your CAPTCHA solving to any throughput — get your CaptchaAI API key and implement horizontal auto-scaling.
Related guides:
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.