"Our CAPTCHA solving works most of the time" isn't a reliability target. SLIs (Service Level Indicators) and SLOs (Service Level Objectives) give you measurable thresholds, error budgets, and actionable alerts for your CAPTCHA pipeline.
Definitions
| Term | Meaning | CAPTCHA Example |
|---|---|---|
| SLI | A metric that measures service quality | Solve success rate: 94.2% |
| SLO | A target value for an SLI | Solve success rate ≥ 92% over 30 days |
| Error Budget | Allowed failures before SLO breach | 8% failure budget = 800 failures per 10,000 tasks |
| Burn Rate | How fast you're consuming error budget | 2x burn rate = budget exhausted in 15 days |
Recommended SLIs for CAPTCHA Solving
SLI 1: Solve Success Rate
Success Rate = Successful Solves / Total Solve Attempts
| CAPTCHA Type | Typical Rate | SLO Target |
|---|---|---|
| reCAPTCHA v2 | 95–99% | ≥ 92% |
| reCAPTCHA v3 | 90–97% | ≥ 88% |
| Cloudflare Turnstile | 95–99% | ≥ 92% |
| hCaptcha | 90–97% | ≥ 88% |
| Image/OCR | 85–95% | ≥ 82% |
SLI 2: Solve Latency
Latency = Time from task submission to solution received
| Percentile | Target | Alert Threshold |
|---|---|---|
| p50 | < 25s | — |
| p95 | < 90s | > 120s |
| p99 | < 180s | > 300s |
SLI 3: Pipeline Availability
Availability = Time pipeline is accepting and solving tasks / Total time
Target: ≥ 99.5% (allows 3.6 hours downtime per month)
Python — SLI/SLO Tracker
import os
import time
from collections import deque
from dataclasses import dataclass, field
API_KEY = os.environ["CAPTCHAAI_API_KEY"]
@dataclass
class SLITracker:
"""Track CAPTCHA solving SLIs over a sliding window."""
window_seconds: int = 86400 * 30 # 30 days default
events: deque = field(default_factory=deque)
def record_success(self, latency_seconds):
self.events.append({
"time": time.time(),
"success": True,
"latency": latency_seconds
})
self._prune()
def record_failure(self, error_code):
self.events.append({
"time": time.time(),
"success": False,
"error": error_code
})
self._prune()
def _prune(self):
cutoff = time.time() - self.window_seconds
while self.events and self.events[0]["time"] < cutoff:
self.events.popleft()
@property
def success_rate(self):
if not self.events:
return 1.0
successes = sum(1 for e in self.events if e["success"])
return successes / len(self.events)
@property
def latency_percentiles(self):
latencies = sorted(
e["latency"] for e in self.events if e.get("latency")
)
if not latencies:
return {"p50": 0, "p95": 0, "p99": 0}
def percentile(data, p):
idx = int(len(data) * p / 100)
return data[min(idx, len(data) - 1)]
return {
"p50": round(percentile(latencies, 50), 2),
"p95": round(percentile(latencies, 95), 2),
"p99": round(percentile(latencies, 99), 2),
}
@property
def error_breakdown(self):
errors = {}
for e in self.events:
if not e["success"]:
code = e.get("error", "unknown")
errors[code] = errors.get(code, 0) + 1
return errors
class SLOChecker:
"""Check SLIs against SLO targets."""
def __init__(self, tracker):
self.tracker = tracker
self.slos = {
"success_rate": 0.92, # ≥ 92%
"latency_p95": 90.0, # < 90 seconds
"latency_p99": 180.0, # < 180 seconds
}
@property
def error_budget_total(self):
"""Total allowed failures in the window."""
total = len(self.tracker.events)
return int(total * (1 - self.slos["success_rate"]))
@property
def error_budget_remaining(self):
"""How many more failures before SLO breach."""
total = len(self.tracker.events)
failures = sum(1 for e in self.tracker.events if not e["success"])
budget = self.error_budget_total
return max(0, budget - failures)
@property
def error_budget_pct(self):
"""Percentage of error budget remaining."""
total = self.error_budget_total
if total == 0:
return 100.0
return round(self.error_budget_remaining / total * 100, 1)
@property
def burn_rate(self):
"""How fast error budget is being consumed.
1.0 = on track, 2.0 = will exhaust in half the window.
"""
total = len(self.tracker.events)
if total == 0:
return 0.0
failures = sum(1 for e in self.tracker.events if not e["success"])
expected_failures = total * (1 - self.slos["success_rate"])
if expected_failures == 0:
return 0.0
return round(failures / expected_failures, 2)
def check_all(self):
"""Check all SLOs and return status."""
rate = self.tracker.success_rate
latencies = self.tracker.latency_percentiles
return {
"success_rate": {
"current": round(rate, 4),
"target": self.slos["success_rate"],
"met": rate >= self.slos["success_rate"]
},
"latency_p95": {
"current": latencies["p95"],
"target": self.slos["latency_p95"],
"met": latencies["p95"] <= self.slos["latency_p95"]
},
"latency_p99": {
"current": latencies["p99"],
"target": self.slos["latency_p99"],
"met": latencies["p99"] <= self.slos["latency_p99"]
},
"error_budget": {
"remaining_pct": self.error_budget_pct,
"remaining_count": self.error_budget_remaining,
"burn_rate": self.burn_rate,
},
"overall": rate >= self.slos["success_rate"]
and latencies["p95"] <= self.slos["latency_p95"]
}
# Usage
tracker = SLITracker(window_seconds=86400 * 30)
slo = SLOChecker(tracker)
# After each solve:
# tracker.record_success(latency_seconds=24.5)
# tracker.record_failure("ERROR_CAPTCHA_UNSOLVABLE")
# Check SLOs:
# print(slo.check_all())
JavaScript — SLO Dashboard
class SLODashboard {
constructor(windowMs = 30 * 24 * 60 * 60 * 1000) {
this.windowMs = windowMs;
this.events = [];
this.slos = {
successRate: 0.92,
latencyP95: 90,
latencyP99: 180,
};
}
recordSuccess(latencySeconds) {
this.events.push({ time: Date.now(), success: true, latency: latencySeconds });
this._prune();
}
recordFailure(errorCode) {
this.events.push({ time: Date.now(), success: false, error: errorCode });
this._prune();
}
_prune() {
const cutoff = Date.now() - this.windowMs;
this.events = this.events.filter((e) => e.time > cutoff);
}
get successRate() {
if (this.events.length === 0) return 1;
const successes = this.events.filter((e) => e.success).length;
return successes / this.events.length;
}
get errorBudget() {
const total = this.events.length;
const allowedFailures = Math.floor(total * (1 - this.slos.successRate));
const actualFailures = this.events.filter((e) => !e.success).length;
const remaining = Math.max(0, allowedFailures - actualFailures);
return {
total: allowedFailures,
consumed: actualFailures,
remaining,
remainingPct: allowedFailures > 0
? ((remaining / allowedFailures) * 100).toFixed(1)
: "100.0",
burnRate: allowedFailures > 0
? (actualFailures / allowedFailures).toFixed(2)
: "0.00",
};
}
get report() {
const latencies = this.events
.filter((e) => e.success && e.latency)
.map((e) => e.latency)
.sort((a, b) => a - b);
const p95 = latencies.length > 0
? latencies[Math.floor(latencies.length * 0.95)]
: 0;
return {
sliSuccessRate: (this.successRate * 100).toFixed(2) + "%",
sloSuccessRate: (this.slos.successRate * 100).toFixed(0) + "%",
sloMet: this.successRate >= this.slos.successRate,
latencyP95: p95.toFixed(1) + "s",
errorBudget: this.errorBudget,
totalEvents: this.events.length,
};
}
}
const dashboard = new SLODashboard();
// dashboard.recordSuccess(24.5);
// console.log(dashboard.report);
Burn Rate Alert Thresholds
| Burn Rate | Meaning | Alert |
|---|---|---|
| 1.0 | On track — budget lasts the full window | None |
| 2.0 | Budget exhausted in half the window | Warning |
| 6.0 | Budget exhausted in 5 days | Page on-call |
| 14.0 | Budget exhausted in ~2 days | Critical — immediate action |
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| SLO always breached | Target too aggressive | Start with current performance − 3% as SLO |
| Error budget always full | SLO too loose | Tighten SLO to drive improvements |
| Burn rate spikes | Burst of failures | Check if transient (retry storm) or systemic |
| Budget consumed by one error type | Single root cause | Fix that error type; see error breakdown |
FAQ
What SLO should I start with?
Measure your current success rate over 7 days. Subtract 3 percentage points — that's your starting SLO. Tighten it as you improve reliability.
Who owns the CAPTCHA SLO?
The team that operates the CAPTCHA solving pipeline. If scraping and CAPTCHA solving are separate teams, the CAPTCHA team owns solve rate SLOs while the scraping team owns end-to-end SLOs.
Should I set different SLOs per CAPTCHA type?
Yes. Image/OCR CAPTCHAs have fundamentally different success rates than reCAPTCHA v2. Setting per-type SLOs prevents one type from masking another's issues.
Related Articles
- Captcha Solve Rate Regression Diagnosis
- Captcha Solve Success Rate Dropping Diagnosis
- Proxy Quality Affects Captcha Solve Rate
Next Steps
Set measurable reliability targets — get your CaptchaAI API key and define SLOs for your pipeline.
Related guides:
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.