Tutorials

Webhook Endpoint Monitoring for CAPTCHA Solve Callbacks

Your CaptchaAI callback endpoint is a critical dependency — if it goes down, solved CAPTCHAs don't reach your application. Built-in monitoring catches problems before they cascade.

What to Monitor

Metric Why It Matters Healthy Range
Endpoint uptime Callbacks fail during downtime > 99.5%
Response latency Slow responses may timeout < 500 ms
Error rate (4xx/5xx) Indicates handler bugs < 1%
Callback delivery rate Ratio of callbacks received vs tasks submitted > 95%
Time between callbacks Detects sudden stops < 5× average interval

Self-Monitoring Middleware

Add monitoring directly to your callback handler.

Python (Flask)

import time
import threading
from collections import deque
from flask import Flask, request, jsonify

app = Flask(__name__)

# Rolling window metrics (last 1000 callbacks)
metrics = {
    "total_received": 0,
    "total_errors": 0,
    "latencies": deque(maxlen=1000),
    "last_callback_at": 0,
    "error_counts": {}
}
metrics_lock = threading.Lock()


@app.route("/callback")
def captcha_callback():
    start = time.time()

    task_id = request.args.get("id")
    solution = request.args.get("code")

    try:
        # Process the callback
        store_result(task_id, solution)
        status = "ok"
        http_code = 200
    except Exception as e:
        status = "error"
        http_code = 200  # Still ACK to CaptchaAI
        error_type = type(e).__name__
        with metrics_lock:
            metrics["total_errors"] += 1
            metrics["error_counts"][error_type] = \
                metrics["error_counts"].get(error_type, 0) + 1

    # Record metrics
    latency_ms = (time.time() - start) * 1000
    with metrics_lock:
        metrics["total_received"] += 1
        metrics["latencies"].append(latency_ms)
        metrics["last_callback_at"] = time.time()

    return "OK", http_code


@app.route("/health/callbacks")
def callback_health():
    """Health endpoint for monitoring."""
    with metrics_lock:
        latencies = list(metrics["latencies"])
        last_at = metrics["last_callback_at"]

    now = time.time()
    avg_latency = sum(latencies) / len(latencies) if latencies else 0
    p95_latency = sorted(latencies)[int(len(latencies) * 0.95)] if latencies else 0
    seconds_since_last = now - last_at if last_at > 0 else -1

    health = {
        "status": "healthy" if seconds_since_last < 300 else "stale",
        "total_received": metrics["total_received"],
        "total_errors": metrics["total_errors"],
        "error_rate": metrics["total_errors"] / max(metrics["total_received"], 1),
        "avg_latency_ms": round(avg_latency, 2),
        "p95_latency_ms": round(p95_latency, 2),
        "seconds_since_last_callback": round(seconds_since_last, 1),
        "error_breakdown": dict(metrics["error_counts"])
    }

    status_code = 200 if health["status"] == "healthy" else 503
    return jsonify(health), status_code

JavaScript (Express)

const express = require("express");
const app = express();

const metrics = {
  totalReceived: 0,
  totalErrors: 0,
  latencies: [],
  lastCallbackAt: 0,
  errorCounts: {},
};

const MAX_LATENCIES = 1000;

app.get("/callback", (req, res) => {
  const start = Date.now();
  const taskId = req.query.id;
  const solution = req.query.code;

  try {
    storeResult(taskId, solution);
  } catch (err) {
    metrics.totalErrors++;
    const errType = err.constructor.name;
    metrics.errorCounts[errType] = (metrics.errorCounts[errType] || 0) + 1;
  }

  const latencyMs = Date.now() - start;
  metrics.totalReceived++;
  metrics.latencies.push(latencyMs);
  if (metrics.latencies.length > MAX_LATENCIES) metrics.latencies.shift();
  metrics.lastCallbackAt = Date.now();

  res.sendStatus(200);
});

app.get("/health/callbacks", (req, res) => {
  const latencies = [...metrics.latencies].sort((a, b) => a - b);
  const avgLatency =
    latencies.length > 0
      ? latencies.reduce((a, b) => a + b, 0) / latencies.length
      : 0;
  const p95Latency =
    latencies.length > 0
      ? latencies[Math.floor(latencies.length * 0.95)]
      : 0;
  const secondsSinceLast =
    metrics.lastCallbackAt > 0
      ? (Date.now() - metrics.lastCallbackAt) / 1000
      : -1;

  const health = {
    status: secondsSinceLast < 300 ? "healthy" : "stale",
    totalReceived: metrics.totalReceived,
    totalErrors: metrics.totalErrors,
    errorRate: metrics.totalErrors / Math.max(metrics.totalReceived, 1),
    avgLatencyMs: Math.round(avgLatency * 100) / 100,
    p95LatencyMs: Math.round(p95Latency * 100) / 100,
    secondsSinceLastCallback: Math.round(secondsSinceLast * 10) / 10,
    errorBreakdown: metrics.errorCounts,
  };

  res.status(health.status === "healthy" ? 200 : 503).json(health);
});

app.listen(3000);

Delivery Rate Tracking

Compare tasks submitted with callbacks received to measure delivery success:

Python

import time

submitted_tasks = {}  # task_id -> submitted_at
delivered_tasks = set()
delivery_timeout = 300  # 5 minutes


def on_submit(task_id):
    """Call after submitting to CaptchaAI with pingback."""
    submitted_tasks[task_id] = time.time()


def on_callback(task_id):
    """Call when callback is received."""
    delivered_tasks.add(task_id)
    submitted_tasks.pop(task_id, None)


def get_delivery_stats():
    """Calculate delivery metrics."""
    now = time.time()

    # Expired tasks = submitted > 5 min ago, never received callback
    expired = [
        tid for tid, ts in submitted_tasks.items()
        if now - ts > delivery_timeout
    ]

    total = len(delivered_tasks) + len(expired)
    rate = len(delivered_tasks) / max(total, 1)

    return {
        "delivered": len(delivered_tasks),
        "missed": len(expired),
        "pending": len(submitted_tasks) - len(expired),
        "delivery_rate": round(rate, 4),
        "missed_task_ids": expired[:10]  # Sample for debugging
    }

Alert Conditions

Set up alerts for these conditions:

Alert Trigger Severity
Stale endpoint No callback received in 5+ minutes Warning
High error rate > 5% error rate over 100 requests Critical
Slow responses p95 latency > 1000 ms Warning
Low delivery rate < 90% delivery rate Critical
Endpoint down Health check returns 503 or timeout Critical

Simple Alert Script

import requests
import time


def check_callback_health(health_url, alert_callback):
    """Periodic health checker."""
    while True:
        try:
            resp = requests.get(health_url, timeout=5)
            health = resp.json()

            if resp.status_code != 200:
                alert_callback("CRITICAL", f"Callback endpoint unhealthy: {health['status']}")

            if health.get("error_rate", 0) > 0.05:
                alert_callback("CRITICAL", f"High error rate: {health['error_rate']:.1%}")

            if health.get("p95_latency_ms", 0) > 1000:
                alert_callback("WARNING", f"Slow callbacks: p95={health['p95_latency_ms']}ms")

            if health.get("seconds_since_last_callback", -1) > 300:
                alert_callback("WARNING", f"No callbacks for {health['seconds_since_last_callback']:.0f}s")

        except requests.RequestException as e:
            alert_callback("CRITICAL", f"Health check failed: {e}")

        time.sleep(60)  # Check every minute

External Monitoring Integration

For production systems, pair self-monitoring with external uptime checks:

Tool Integration
UptimeRobot Monitor /health/callbacks endpoint
Pingdom HTTP check with response body validation
AWS CloudWatch Synthetic canary on health endpoint
Self-hosted Cron job calling health check script

Troubleshooting

Issue Cause Fix
Health endpoint shows "stale" with no callbacks No tasks submitted recently, or callbacks not reaching server Check if tasks are being submitted with pingback; verify firewall rules
High latency on callback handler Slow database writes in handler Process async — accept callback, queue for background processing
Delivery rate dropping Server restarts clearing in-memory task tracking Use Redis or database to persist submitted task IDs
Error rate spikes Downstream service (database) failing Check error breakdown; fix underlying service

FAQ

Should I use a separate service for monitoring?

For small setups, self-monitoring middleware is sufficient. For production systems with SLAs, add external monitoring (UptimeRobot, Pingdom) that checks from outside your infrastructure.

How long should I keep metrics in memory?

A rolling window of the last 1,000 events is usually enough for real-time dashboards. For historical analysis, export metrics to Prometheus, Datadog, or a time-series database.

What if my callback endpoint is behind a load balancer?

Each instance tracks its own metrics. Aggregate across instances in your monitoring platform, or expose a shared metrics store (Redis) that all instances write to.

Next Steps

Monitor your callback endpoints — get your CaptchaAI API key and add health checks from day one.

Related guides:

Discussions (0)

No comments yet.

Related Posts

Tutorials Discord Webhook Alerts for CAPTCHA Pipeline Status
Send CAPTCHA pipeline alerts to Discord — webhook integration for balance warnings, error spikes, queue status, and daily summary reports with Captcha AI.

Send CAPTCHA pipeline alerts to Discord — webhook integration for balance warnings, error spikes, queue status...

Python Automation All CAPTCHA Types
API Tutorials CaptchaAI Callback URL Setup: Complete Webhook Guide
Set up Captcha AI callback URLs to receive solved CAPTCHA tokens via webhook instead of polling.

Set up Captcha AI callback URLs to receive solved CAPTCHA tokens via webhook instead of polling. Python Flask...

Python Automation All CAPTCHA Types
Feb 06, 2026
Tutorials CaptchaAI Callback URL Error Handling: Retry and Dead-Letter Patterns
Handle Captcha AI callback failures gracefully — implement retry logic, dead-letter queues, and fallback polling for missed CAPTCHA results.

Handle Captcha AI callback failures gracefully — implement retry logic, dead-letter queues, and fallback polli...

Python Automation All CAPTCHA Types
Feb 01, 2026
Tutorials Batch CAPTCHA Solving Cost Estimation and Budget Alerts
Estimate costs for batch CAPTCHA solving, set budget limits, track per-task spending, and configure alerts to prevent unexpected charges with Captcha AI.

Estimate costs for batch CAPTCHA solving, set budget limits, track per-task spending, and configure alerts to...

Python Automation All CAPTCHA Types
Mar 28, 2026
API Tutorials CaptchaAI Pingback and Task Notification Patterns
Implement advanced pingback (callback) patterns for Captcha AI.

Implement advanced pingback (callback) patterns for Captcha AI. Learn fire-and-forget, multi-task fan-out, web...

Python Automation All CAPTCHA Types
Mar 16, 2026
API Tutorials Building a CaptchaAI Usage Dashboard and Monitoring
Build a real-time Captcha AI usage dashboard.

Build a real-time Captcha AI usage dashboard. Track solve rates, spending, success/failure ratios, and respons...

Python Automation All CAPTCHA Types
Mar 11, 2026
Reference CAPTCHA Solve Rate SLI/SLO: How to Define and Monitor
Define SLIs and SLOs for CAPTCHA solving — success rate, latency percentiles, availability targets, error budgets, and burn rate alerting with Captcha AI.

Define SLIs and SLOs for CAPTCHA solving — success rate, latency percentiles, availability targets, error budg...

Python Automation All CAPTCHA Types
Mar 05, 2026
DevOps & Scaling Grafana Dashboard Templates for CaptchaAI Metrics
Ready-to-import Grafana dashboard templates for Captcha AI — solve rate panels, latency histograms, balance gauges, and queue depth monitors.

Ready-to-import Grafana dashboard templates for Captcha AI — solve rate panels, latency histograms, balance ga...

Python Automation All CAPTCHA Types
Feb 21, 2026
DevOps & Scaling CaptchaAI Monitoring with Datadog: Metrics and Alerts
Monitor Captcha AI performance with Datadog — custom metrics, dashboards, anomaly detection alerts, and solve rate tracking for CAPTCHA solving pipelines.

Monitor Captcha AI performance with Datadog — custom metrics, dashboards, anomaly detection alerts, and solve...

Python Automation All CAPTCHA Types
Feb 19, 2026
Tutorials Handling Multiple CAPTCHAs on a Single Page
how to detect and solve multiple CAPTCHAs on a single web page using Captcha AI.

Learn how to detect and solve multiple CAPTCHAs on a single web page using Captcha AI. Covers multi-iframe ext...

Python Cloudflare Turnstile reCAPTCHA v2
Apr 09, 2026
Tutorials Pytest Fixtures for CaptchaAI API Testing
Build reusable pytest fixtures to test CAPTCHA-solving workflows with Captcha AI.

Build reusable pytest fixtures to test CAPTCHA-solving workflows with Captcha AI. Covers mocking, live integra...

Python Automation Cloudflare Turnstile
Apr 08, 2026