Tutorials

Memory and CPU Optimization for CAPTCHA Solving Workers

CAPTCHA solving workers are I/O-bound — they spend most of their time waiting for API responses. But poor resource management can still cause memory leaks, high CPU usage, and process crashes. This guide covers practical optimizations for CaptchaAI workers.

Common Resource Bottlenecks

Bottleneck Cause Impact
Memory growth Unbounded response buffering OOM kills, swap thrashing
High CPU Busy-wait polling loops Waste compute, block other tasks
Connection leaks Unclosed HTTP sessions File descriptor exhaustion
Large payloads Base64 image bodies in memory 2–5 MB per image CAPTCHA

Python: Lean Worker Patterns

Use Connection Pooling with Limits

# lean_worker.py
import os
import asyncio
import aiohttp

API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")

async def create_lean_session():
    """Create a memory-efficient aiohttp session."""
    connector = aiohttp.TCPConnector(
        limit=20,            # Max connections
        limit_per_host=20,   # All go to same host
        keepalive_timeout=30,
        enable_cleanup_closed=True,
    )
    return aiohttp.ClientSession(
        connector=connector,
        timeout=aiohttp.ClientTimeout(total=30),
    )

async def solve_captcha(session, sitekey, pageurl):
    """Solve with minimal memory footprint."""
    # Submit
    async with session.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY,
        "method": "userrecaptcha",
        "googlekey": sitekey,
        "pageurl": pageurl,
        "json": "1",
    }) as resp:
        # Read and release response immediately
        result = await resp.json(content_type=None)

    if result.get("status") != 1:
        return None

    task_id = result["request"]
    del result  # Free memory

    # Poll with sleep (not busy-wait)
    await asyncio.sleep(15)
    for _ in range(25):
        async with session.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get",
            "id": task_id, "json": "1",
        }) as resp:
            poll_result = await resp.json(content_type=None)

        if poll_result.get("status") == 1:
            token = poll_result["request"]
            del poll_result
            return token

        if poll_result.get("request") != "CAPCHA_NOT_READY":
            return None

        del poll_result
        await asyncio.sleep(5)  # Async sleep — zero CPU

    return None

async def main():
    session = await create_lean_session()
    try:
        tasks = [
            solve_captcha(session, "SITEKEY", "https://example.com")
            for _ in range(50)
        ]
        results = await asyncio.gather(*tasks)
        solved = sum(1 for r in results if r)
        print(f"Solved: {solved}/{len(tasks)}")
    finally:
        await session.close()

asyncio.run(main())

Stream Large Image CAPTCHAs

For Image/OCR CAPTCHAs, avoid loading entire images into memory:

import base64
import aiohttp

async def submit_image_streaming(session, image_path):
    """Submit image CAPTCHA without loading entire file into memory."""
    # Read file in chunks and encode
    with open(image_path, "rb") as f:
        image_data = base64.b64encode(f.read()).decode("ascii")

    # Submit and immediately release the base64 string
    async with session.post("https://ocr.captchaai.com/in.php", data={
        "key": API_KEY,
        "method": "base64",
        "body": image_data,
        "json": "1",
    }) as resp:
        result = await resp.json(content_type=None)

    del image_data  # Free the base64 string immediately
    return result

Monitor Memory Usage

import tracemalloc

tracemalloc.start()

# ... run your solver ...

current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.1f} MB")
print(f"Peak: {peak / 1024 / 1024:.1f} MB")
tracemalloc.stop()

JavaScript: Resource-Efficient Patterns

Proper Agent Configuration

// lean_worker.js
const axios = require('axios');
const https = require('https');

const API_KEY = process.env.CAPTCHAAI_KEY || 'YOUR_API_KEY';

// Configure agent for minimal resource usage
const agent = new https.Agent({
  keepAlive: true,
  maxSockets: 20,        // Limit concurrent connections
  maxFreeSockets: 5,     // Keep 5 idle for reuse
  timeout: 30000,        // Close idle connections after 30s
});

const api = axios.create({
  baseURL: 'https://ocr.captchaai.com',
  httpsAgent: agent,
  timeout: 30000,
  maxContentLength: 50000, // Limit response size (50 KB)
  maxBodyLength: 5000000,  // Limit request body (5 MB for images)
});

async function solveCaptcha(sitekey, pageurl) {
  const submit = await api.get('/in.php', {
    params: {
      key: API_KEY, method: 'userrecaptcha',
      googlekey: sitekey, pageurl, json: '1',
    },
  });

  if (submit.data.status !== 1) return null;
  const taskId = submit.data.request;

  await new Promise(r => setTimeout(r, 15000));

  for (let i = 0; i < 25; i++) {
    const poll = await api.get('/res.php', {
      params: { key: API_KEY, action: 'get', id: taskId, json: '1' },
    });

    if (poll.data.status === 1) return poll.data.request;
    if (poll.data.request !== 'CAPCHA_NOT_READY') return null;

    await new Promise(r => setTimeout(r, 5000));
  }
  return null;
}

// Process with concurrency control
async function processWithLimit(tasks, concurrency) {
  const results = [];
  const active = new Set();

  for (const task of tasks) {
    const p = solveCaptcha(task.sitekey, task.pageurl).then(r => {
      active.delete(p);
      return r;
    });
    active.add(p);
    results.push(p);

    if (active.size >= concurrency) await Promise.race(active);
  }
  return Promise.all(results);
}

// Monitor memory
function logMemory() {
  const usage = process.memoryUsage();
  console.log(`RSS: ${(usage.rss / 1024 / 1024).toFixed(1)} MB`);
  console.log(`Heap: ${(usage.heapUsed / 1024 / 1024).toFixed(1)} MB`);
}

Resource Budgets

Target resource usage per concurrency level:

Concurrent solves Expected memory Expected CPU Connections
10 30–50 MB < 5% 10
50 60–100 MB < 10% 20
100 100–200 MB < 15% 50
500 300–500 MB < 25% 100

If your worker exceeds these targets, look for:

  • Unbounded buffers (accumulating results without processing)
  • Connection leaks (sessions not closed on error)
  • Synchronous file I/O blocking the event loop

Anti-Patterns to Avoid

Anti-Pattern Problem Fix
while True polling without sleep 100% CPU usage Use asyncio.sleep() or setTimeout()
Storing all tokens in memory Unbounded growth Write to database or file as they arrive
Creating new HTTP client per request Connection churn, memory waste Reuse a single session/client
Loading all images at once Memory spike Process images one at a time or in small batches
Not closing sessions on shutdown Connection leaks Use try/finally or process signal handlers

Troubleshooting

Issue Cause Fix
Memory climbs over time Result accumulation or connection leak Process results immediately; close sessions on error
CPU spikes during polling Busy-wait loop or JSON parsing overhead Use async sleep; limit response parsing
Process killed by OS (OOM) Memory exceeds system limit Set maxSockets, process images in batches
File descriptor limit hit Too many open connections Set ulimit -n 65536 (Linux) or reduce pool size

FAQ

Does CaptchaAI solving use local CPU for computation?

No. The actual CAPTCHA solving happens on CaptchaAI's servers. Your worker only performs HTTP requests and JSON parsing, which are lightweight operations.

Should I use processes or threads for parallelism?

Use async I/O (asyncio for Python, native Promise for Node.js). Threads add memory overhead without benefit for I/O-bound work. Use processes only if you need to exceed 500+ concurrent solves.

How do I detect a memory leak in my worker?

Track RSS and heap used over time. If either grows linearly without plateau, you have a leak. Use tracemalloc (Python) or --inspect (Node.js) to identify the source.

Next Steps

Build resource-efficient CAPTCHA solving workers — get your CaptchaAI API key.

Related guides:

Discussions (0)

No comments yet.

Related Posts

DevOps & Scaling Ansible Playbooks for CaptchaAI Worker Deployment
Deploy and manage Captcha AI workers with Ansible — playbooks for provisioning, configuration, rolling updates, and health checks across your server fleet.

Deploy and manage Captcha AI workers with Ansible — playbooks for provisioning, configuration, rolling updates...

Automation Python All CAPTCHA Types
Apr 07, 2026
DevOps & Scaling Blue-Green Deployment for CAPTCHA Solving Infrastructure
Implement blue-green deployments for CAPTCHA solving infrastructure — zero-downtime upgrades, traffic switching, and rollback strategies with Captcha AI.

Implement blue-green deployments for CAPTCHA solving infrastructure — zero-downtime upgrades, traffic switchin...

Automation Python All CAPTCHA Types
Apr 07, 2026
DevOps & Scaling Auto-Scaling CAPTCHA Solving Workers
Build auto-scaling CAPTCHA solving workers that adjust capacity based on queue depth, balance, and solve rates.

Build auto-scaling CAPTCHA solving workers that adjust capacity based on queue depth, balance, and solve rates...

Automation Python All CAPTCHA Types
Mar 23, 2026
DevOps & Scaling CaptchaAI Monitoring with Datadog: Metrics and Alerts
Monitor Captcha AI performance with Datadog — custom metrics, dashboards, anomaly detection alerts, and solve rate tracking for CAPTCHA solving pipelines.

Monitor Captcha AI performance with Datadog — custom metrics, dashboards, anomaly detection alerts, and solve...

Automation Python All CAPTCHA Types
Feb 19, 2026
DevOps & Scaling Rolling Updates for CAPTCHA Solving Worker Fleets
Implement rolling updates for CAPTCHA solving worker fleets — zero-downtime upgrades, graceful draining, health-gated progression, and automatic rollback.

Implement rolling updates for CAPTCHA solving worker fleets — zero-downtime upgrades, graceful draining, healt...

Automation Python All CAPTCHA Types
Feb 28, 2026
DevOps & Scaling OpenTelemetry Tracing for CAPTCHA Solving Pipelines
Instrument CAPTCHA solving pipelines with Open Telemetry — distributed traces, spans for submit/poll phases, and vendor-neutral observability with Captcha AI.

Instrument CAPTCHA solving pipelines with Open Telemetry — distributed traces, spans for submit/poll phases, a...

Automation Python All CAPTCHA Types
Mar 07, 2026
DevOps & Scaling CaptchaAI Behind a Load Balancer: Architecture Patterns
Architect CAPTCHA solving workers behind a load balancer — routing strategies, health checks, sticky sessions, and scaling patterns with Captcha AI.

Architect CAPTCHA solving workers behind a load balancer — routing strategies, health checks, sticky sessions,...

Automation Python All CAPTCHA Types
Feb 24, 2026
DevOps & Scaling CaptchaAI Monitoring with New Relic: APM Integration
Integrate Captcha AI with New Relic APM — custom events, transaction tracing, dashboards, and alert policies for CAPTCHA solving performance.

Integrate Captcha AI with New Relic APM — custom events, transaction tracing, dashboards, and alert policies f...

Automation Python All CAPTCHA Types
Jan 31, 2026
DevOps & Scaling Building Custom CaptchaAI Alerts with PagerDuty
Integrate Captcha AI with Pager Duty for incident management — trigger alerts on low balance, high error rates, and pipeline failures with escalation policies.

Integrate Captcha AI with Pager Duty for incident management — trigger alerts on low balance, high error rates...

Automation Python All CAPTCHA Types
Jan 15, 2026
Tutorials Pytest Fixtures for CaptchaAI API Testing
Build reusable pytest fixtures to test CAPTCHA-solving workflows with Captcha AI.

Build reusable pytest fixtures to test CAPTCHA-solving workflows with Captcha AI. Covers mocking, live integra...

Automation Python reCAPTCHA v2
Apr 08, 2026
Tutorials Using Fiddler to Inspect CaptchaAI API Traffic
How to use Fiddler Everywhere and Fiddler Classic to capture, inspect, and debug Captcha AI API requests and responses — filters, breakpoints, and replay for tr...

How to use Fiddler Everywhere and Fiddler Classic to capture, inspect, and debug Captcha AI API requests and r...

Automation Python All CAPTCHA Types
Mar 05, 2026