DevOps & Scaling

Building Custom CaptchaAI Alerts with PagerDuty

A CAPTCHA solving outage at 3 AM costs hours of missed data. PagerDuty integration ensures the right person gets notified immediately — with enough context to diagnose and fix the issue without digging through logs.

Alert Strategy

Severity Condition PagerDuty Action
Critical Balance < $2 Page on-call engineer
Critical All workers down Page on-call engineer
High Error rate > 20% for 5 min Create urgent incident
Warning Balance < $10 Create low-urgency incident
Warning Queue depth > 100 for 10 min Create low-urgency incident
Info Solve latency p95 > 120s Add to existing incident or log

Python — PagerDuty Events API v2

import os
import time
import hashlib
import requests
from datetime import datetime

API_KEY = os.environ["CAPTCHAAI_API_KEY"]
PAGERDUTY_ROUTING_KEY = os.environ["PAGERDUTY_ROUTING_KEY"]

session = requests.Session()


class CaptchaPagerDuty:
    EVENTS_URL = "https://events.pagerduty.com/v2/enqueue"

    def __init__(self, routing_key):
        self.routing_key = routing_key

    def trigger(self, summary, severity="error", source="captcha-pipeline",
                details=None, dedup_key=None):
        """Trigger a new PagerDuty incident."""
        payload = {
            "routing_key": self.routing_key,
            "event_action": "trigger",
            "payload": {
                "summary": summary,
                "severity": severity,  # critical, error, warning, info
                "source": source,
                "timestamp": datetime.utcnow().isoformat() + "Z",
                "custom_details": details or {}
            }
        }

        if dedup_key:
            payload["dedup_key"] = dedup_key

        resp = requests.post(self.EVENTS_URL, json=payload, timeout=10)
        resp.raise_for_status()
        return resp.json()

    def resolve(self, dedup_key):
        """Resolve an existing incident."""
        payload = {
            "routing_key": self.routing_key,
            "event_action": "resolve",
            "dedup_key": dedup_key
        }
        resp = requests.post(self.EVENTS_URL, json=payload, timeout=10)
        resp.raise_for_status()
        return resp.json()

    def acknowledge(self, dedup_key):
        """Acknowledge an existing incident."""
        payload = {
            "routing_key": self.routing_key,
            "event_action": "acknowledge",
            "dedup_key": dedup_key
        }
        resp = requests.post(self.EVENTS_URL, json=payload, timeout=10)
        resp.raise_for_status()
        return resp.json()


pagerduty = CaptchaPagerDuty(PAGERDUTY_ROUTING_KEY)


class CaptchaMonitor:
    def __init__(self):
        self.error_window = []  # (timestamp, is_error)
        self.window_size = 300  # 5 minutes in seconds

    def record_solve(self, success):
        now = time.time()
        self.error_window.append((now, not success))
        # Prune old entries
        self.error_window = [
            (t, e) for t, e in self.error_window
            if now - t < self.window_size
        ]

    @property
    def error_rate(self):
        if not self.error_window:
            return 0.0
        errors = sum(1 for _, e in self.error_window if e)
        return errors / len(self.error_window)

    def check_balance(self):
        resp = session.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "getbalance", "json": 1
        })
        data = resp.json()
        if data.get("status") != 1:
            return None
        return float(data["request"])

    def run_checks(self):
        """Run all monitoring checks and trigger alerts."""
        # Check balance
        balance = self.check_balance()
        if balance is not None:
            if balance < 2:
                pagerduty.trigger(
                    summary=f"CaptchaAI balance critically low: ${balance:.2f}",
                    severity="critical",
                    dedup_key="captcha-balance-critical",
                    details={"balance": balance, "threshold": 2}
                )
            elif balance < 10:
                pagerduty.trigger(
                    summary=f"CaptchaAI balance low: ${balance:.2f}",
                    severity="warning",
                    dedup_key="captcha-balance-warning",
                    details={"balance": balance, "threshold": 10}
                )
            else:
                # Resolve if balance recovered
                try:
                    pagerduty.resolve("captcha-balance-critical")
                    pagerduty.resolve("captcha-balance-warning")
                except Exception:
                    pass  # No incident to resolve

        # Check error rate
        rate = self.error_rate
        if rate > 0.20:
            total = len(self.error_window)
            errors = sum(1 for _, e in self.error_window if e)
            pagerduty.trigger(
                summary=f"CaptchaAI error rate {rate:.0%} "
                        f"({errors}/{total} in 5 min)",
                severity="error",
                dedup_key="captcha-error-rate-high",
                details={
                    "error_rate": round(rate, 3),
                    "total_tasks": total,
                    "failed_tasks": errors,
                    "window_seconds": self.window_size
                }
            )
        elif rate < 0.05 and len(self.error_window) > 10:
            try:
                pagerduty.resolve("captcha-error-rate-high")
            except Exception:
                pass


monitor = CaptchaMonitor()

# After each solve:
# monitor.record_solve(success=True)

# Run checks every 60 seconds:
# while True:
#     monitor.run_checks()
#     time.sleep(60)

JavaScript — PagerDuty Integration

const axios = require("axios");

const API_KEY = process.env.CAPTCHAAI_API_KEY;
const PD_ROUTING_KEY = process.env.PAGERDUTY_ROUTING_KEY;
const PD_EVENTS_URL = "https://events.pagerduty.com/v2/enqueue";

class PagerDutyAlerter {
  constructor(routingKey) {
    this.routingKey = routingKey;
  }

  async trigger(summary, severity = "error", details = {}, dedupKey = null) {
    const payload = {
      routing_key: this.routingKey,
      event_action: "trigger",
      payload: {
        summary,
        severity,
        source: "captcha-pipeline",
        timestamp: new Date().toISOString(),
        custom_details: details,
      },
    };
    if (dedupKey) payload.dedup_key = dedupKey;

    const resp = await axios.post(PD_EVENTS_URL, payload, { timeout: 10000 });
    return resp.data;
  }

  async resolve(dedupKey) {
    await axios.post(PD_EVENTS_URL, {
      routing_key: this.routingKey,
      event_action: "resolve",
      dedup_key: dedupKey,
    }, { timeout: 10000 });
  }
}

const alerter = new PagerDutyAlerter(PD_ROUTING_KEY);

class CaptchaHealthMonitor {
  constructor(windowMs = 300000) {
    this.results = [];
    this.windowMs = windowMs;
  }

  record(success) {
    this.results.push({ time: Date.now(), success });
    const cutoff = Date.now() - this.windowMs;
    this.results = this.results.filter((r) => r.time > cutoff);
  }

  get errorRate() {
    if (this.results.length === 0) return 0;
    const errors = this.results.filter((r) => !r.success).length;
    return errors / this.results.length;
  }

  async checkAndAlert() {
    // Balance check
    try {
      const resp = await axios.get("https://ocr.captchaai.com/res.php", {
        params: { key: API_KEY, action: "getbalance", json: 1 },
      });
      if (resp.data.status === 1) {
        const balance = parseFloat(resp.data.request);
        if (balance < 2) {
          await alerter.trigger(
            `CaptchaAI balance critically low: $${balance.toFixed(2)}`,
            "critical",
            { balance },
            "captcha-balance-critical"
          );
        } else if (balance < 10) {
          await alerter.trigger(
            `CaptchaAI balance low: $${balance.toFixed(2)}`,
            "warning",
            { balance },
            "captcha-balance-warning"
          );
        } else {
          await alerter.resolve("captcha-balance-critical").catch(() => {});
          await alerter.resolve("captcha-balance-warning").catch(() => {});
        }
      }
    } catch (err) {
      console.error("Balance check failed:", err.message);
    }

    // Error rate check
    const rate = this.errorRate;
    if (rate > 0.2 && this.results.length > 10) {
      await alerter.trigger(
        `CaptchaAI error rate: ${(rate * 100).toFixed(1)}%`,
        "error",
        { errorRate: rate, totalTasks: this.results.length },
        "captcha-error-rate"
      );
    } else if (rate < 0.05 && this.results.length > 10) {
      await alerter.resolve("captcha-error-rate").catch(() => {});
    }
  }
}

const monitor = new CaptchaHealthMonitor();

// Run checks every 60 seconds
setInterval(() => monitor.checkAndAlert(), 60000);

module.exports = { monitor, alerter };

PagerDuty Setup Checklist

Step Action
1 Create a service in PagerDuty for "CaptchaAI Pipeline"
2 Add Events API v2 integration to the service
3 Copy the routing key to PAGERDUTY_ROUTING_KEY env var
4 Set up escalation policy (on-call → team lead → manager)
5 Configure notification rules (push, SMS, phone)
6 Add maintenance windows for planned downtime

Troubleshooting

Issue Cause Fix
Alert not triggering Wrong routing key Verify key matches the service's Events API integration
Duplicate incidents Missing dedup_key Always set a consistent dedup key per alert type
Alert flood No cooldown between triggers PagerDuty dedup key suppresses duplicates; ensure you use them
Auto-resolve not working Dedup key mismatch Ensure resolve uses the exact same dedup key as trigger

FAQ

How do I avoid alert fatigue?

Use deduplication keys to group related alerts into a single incident. Set warning alerts as low-urgency (no page). Reserve critical/high-urgency for balance < $2 or all workers down.

Can I integrate PagerDuty with Datadog/New Relic instead?

Yes. Both Datadog and New Relic have native PagerDuty integrations. Use those if you already send metrics to an observability platform. Direct API integration (this guide) is best when you want custom control.

What's the difference between trigger, acknowledge, and resolve?

Trigger creates a new incident. Acknowledge stops notifications but keeps the incident open (someone is working on it). Resolve closes the incident completely.

Next Steps

Get alerted the moment your CAPTCHA pipeline has issues — start with a CaptchaAI API key and connect PagerDuty.

Related guides:

Discussions (0)

No comments yet.

Related Posts

DevOps & Scaling CaptchaAI Monitoring with Datadog: Metrics and Alerts
Monitor Captcha AI performance with Datadog — custom metrics, dashboards, anomaly detection alerts, and solve rate tracking for CAPTCHA solving pipelines.

Monitor Captcha AI performance with Datadog — custom metrics, dashboards, anomaly detection alerts, and solve...

Automation Python All CAPTCHA Types
Feb 19, 2026
DevOps & Scaling Grafana Dashboard Templates for CaptchaAI Metrics
Ready-to-import Grafana dashboard templates for Captcha AI — solve rate panels, latency histograms, balance gauges, and queue depth monitors.

Ready-to-import Grafana dashboard templates for Captcha AI — solve rate panels, latency histograms, balance ga...

Automation Python All CAPTCHA Types
Feb 21, 2026
DevOps & Scaling CaptchaAI Monitoring with New Relic: APM Integration
Integrate Captcha AI with New Relic APM — custom events, transaction tracing, dashboards, and alert policies for CAPTCHA solving performance.

Integrate Captcha AI with New Relic APM — custom events, transaction tracing, dashboards, and alert policies f...

Automation Python All CAPTCHA Types
Jan 31, 2026
DevOps & Scaling Blue-Green Deployment for CAPTCHA Solving Infrastructure
Implement blue-green deployments for CAPTCHA solving infrastructure — zero-downtime upgrades, traffic switching, and rollback strategies with Captcha AI.

Implement blue-green deployments for CAPTCHA solving infrastructure — zero-downtime upgrades, traffic switchin...

Automation Python All CAPTCHA Types
Apr 07, 2026
DevOps & Scaling Ansible Playbooks for CaptchaAI Worker Deployment
Deploy and manage Captcha AI workers with Ansible — playbooks for provisioning, configuration, rolling updates, and health checks across your server fleet.

Deploy and manage Captcha AI workers with Ansible — playbooks for provisioning, configuration, rolling updates...

Automation Python All CAPTCHA Types
Apr 07, 2026
Tutorials Webhook Endpoint Monitoring for CAPTCHA Solve Callbacks
Monitor your Captcha AI callback endpoints — track uptime, response latency, error rates, and set up alerts before missed results impact your pipeline.

Monitor your Captcha AI callback endpoints — track uptime, response latency, error rates, and set up alerts be...

Automation Python All CAPTCHA Types
Mar 12, 2026
Tutorials Discord Webhook Alerts for CAPTCHA Pipeline Status
Send CAPTCHA pipeline alerts to Discord — webhook integration for balance warnings, error spikes, queue status, and daily summary reports with Captcha AI.

Send CAPTCHA pipeline alerts to Discord — webhook integration for balance warnings, error spikes, queue status...

Automation Python All CAPTCHA Types
DevOps & Scaling Rolling Updates for CAPTCHA Solving Worker Fleets
Implement rolling updates for CAPTCHA solving worker fleets — zero-downtime upgrades, graceful draining, health-gated progression, and automatic rollback.

Implement rolling updates for CAPTCHA solving worker fleets — zero-downtime upgrades, graceful draining, healt...

Automation Python All CAPTCHA Types
Feb 28, 2026
DevOps & Scaling AWS Lambda + CaptchaAI: Serverless CAPTCHA Solving
Integrate Captcha AI with AWS Lambda for serverless CAPTCHA solving.

Integrate Captcha AI with AWS Lambda for serverless CAPTCHA solving. Deploy functions, manage API keys with Se...

Automation Python All CAPTCHA Types
Feb 17, 2026
DevOps & Scaling Terraform + CaptchaAI: Infrastructure as Code for CAPTCHA Workers
Deploy CAPTCHA solving infrastructure with Terraform — provision cloud workers, configure auto-scaling, manage secrets, and version your Captcha AI setup as cod...

Deploy CAPTCHA solving infrastructure with Terraform — provision cloud workers, configure auto-scaling, manage...

Automation Python All CAPTCHA Types
Mar 15, 2026
DevOps & Scaling Horizontal Scaling CAPTCHA Solving Workers: When and How
Scale CAPTCHA solving horizontally — identify bottlenecks, add workers dynamically, auto-scale based on queue depth, and manage costs with Captcha AI.

Scale CAPTCHA solving horizontally — identify bottlenecks, add workers dynamically, auto-scale based on queue...

Automation Python All CAPTCHA Types
Mar 07, 2026
DevOps & Scaling CaptchaAI Behind a Load Balancer: Architecture Patterns
Architect CAPTCHA solving workers behind a load balancer — routing strategies, health checks, sticky sessions, and scaling patterns with Captcha AI.

Architect CAPTCHA solving workers behind a load balancer — routing strategies, health checks, sticky sessions,...

Automation Python All CAPTCHA Types
Feb 24, 2026