Memory and CPU Optimization for CAPTCHA Solving Workers

CAPTCHA solving workers are I/O-bound — they spend most of their time waiting for API responses. But poor resource management can still cause memory leaks, high CPU usage, and process crashes. This guide covers practical optimizations for CaptchaAI workers.

Common Resource Bottlenecks

Bottleneck	Cause	Impact
Memory growth	Unbounded response buffering	OOM kills, swap thrashing
High CPU	Busy-wait polling loops	Waste compute, block other tasks
Connection leaks	Unclosed HTTP sessions	File descriptor exhaustion
Large payloads	Base64 image bodies in memory	2–5 MB per image CAPTCHA

Python: Lean Worker Patterns

Use Connection Pooling with Limits

# lean_worker.py
import os
import asyncio
import aiohttp

API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")

async def create_lean_session():
    """Create a memory-efficient aiohttp session."""
    connector = aiohttp.TCPConnector(
        limit=20,            # Max connections
        limit_per_host=20,   # All go to same host
        keepalive_timeout=30,
        enable_cleanup_closed=True,
    )
    return aiohttp.ClientSession(
        connector=connector,
        timeout=aiohttp.ClientTimeout(total=30),
    )

async def solve_captcha(session, sitekey, pageurl):
    """Solve with minimal memory footprint."""
    # Submit
    async with session.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY,
        "method": "userrecaptcha",
        "googlekey": sitekey,
        "pageurl": pageurl,
        "json": "1",
    }) as resp:
        # Read and release response immediately
        result = await resp.json(content_type=None)

    if result.get("status") != 1:
        return None

    task_id = result["request"]
    del result  # Free memory

    # Poll with sleep (not busy-wait)
    await asyncio.sleep(15)
    for _ in range(25):
        async with session.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get",
            "id": task_id, "json": "1",
        }) as resp:
            poll_result = await resp.json(content_type=None)

        if poll_result.get("status") == 1:
            token = poll_result["request"]
            del poll_result
            return token

        if poll_result.get("request") != "CAPCHA_NOT_READY":
            return None

        del poll_result
        await asyncio.sleep(5)  # Async sleep — zero CPU

    return None

async def main():
    session = await create_lean_session()
    try:
        tasks = [
            solve_captcha(session, "SITEKEY", "https://example.com")
            for _ in range(50)
        ]
        results = await asyncio.gather(*tasks)
        solved = sum(1 for r in results if r)
        print(f"Solved: {solved}/{len(tasks)}")
    finally:
        await session.close()

asyncio.run(main())

Stream Large Image CAPTCHAs

For Image/OCR CAPTCHAs, avoid loading entire images into memory:

import base64
import aiohttp

async def submit_image_streaming(session, image_path):
    """Submit image CAPTCHA without loading entire file into memory."""
    # Read file in chunks and encode
    with open(image_path, "rb") as f:
        image_data = base64.b64encode(f.read()).decode("ascii")

    # Submit and immediately release the base64 string
    async with session.post("https://ocr.captchaai.com/in.php", data={
        "key": API_KEY,
        "method": "base64",
        "body": image_data,
        "json": "1",
    }) as resp:
        result = await resp.json(content_type=None)

    del image_data  # Free the base64 string immediately
    return result

Monitor Memory Usage

import tracemalloc

tracemalloc.start()

# ... run your solver ...

current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.1f} MB")
print(f"Peak: {peak / 1024 / 1024:.1f} MB")
tracemalloc.stop()

JavaScript: Resource-Efficient Patterns

Proper Agent Configuration

// lean_worker.js
const axios = require('axios');
const https = require('https');

const API_KEY = process.env.CAPTCHAAI_KEY || 'YOUR_API_KEY';

// Configure agent for minimal resource usage
const agent = new https.Agent({
  keepAlive: true,
  maxSockets: 20,        // Limit concurrent connections
  maxFreeSockets: 5,     // Keep 5 idle for reuse
  timeout: 30000,        // Close idle connections after 30s
});

const api = axios.create({
  baseURL: 'https://ocr.captchaai.com',
  httpsAgent: agent,
  timeout: 30000,
  maxContentLength: 50000, // Limit response size (50 KB)
  maxBodyLength: 5000000,  // Limit request body (5 MB for images)
});

async function solveCaptcha(sitekey, pageurl) {
  const submit = await api.get('/in.php', {
    params: {
      key: API_KEY, method: 'userrecaptcha',
      googlekey: sitekey, pageurl, json: '1',
    },
  });

  if (submit.data.status !== 1) return null;
  const taskId = submit.data.request;

  await new Promise(r => setTimeout(r, 15000));

  for (let i = 0; i < 25; i++) {
    const poll = await api.get('/res.php', {
      params: { key: API_KEY, action: 'get', id: taskId, json: '1' },
    });

    if (poll.data.status === 1) return poll.data.request;
    if (poll.data.request !== 'CAPCHA_NOT_READY') return null;

    await new Promise(r => setTimeout(r, 5000));
  }
  return null;
}

// Process with concurrency control
async function processWithLimit(tasks, concurrency) {
  const results = [];
  const active = new Set();

  for (const task of tasks) {
    const p = solveCaptcha(task.sitekey, task.pageurl).then(r => {
      active.delete(p);
      return r;
    });
    active.add(p);
    results.push(p);

    if (active.size >= concurrency) await Promise.race(active);
  }
  return Promise.all(results);
}

// Monitor memory
function logMemory() {
  const usage = process.memoryUsage();
  console.log(`RSS: ${(usage.rss / 1024 / 1024).toFixed(1)} MB`);
  console.log(`Heap: ${(usage.heapUsed / 1024 / 1024).toFixed(1)} MB`);
}

Resource Budgets

Target resource usage per concurrency level:

Concurrent solves	Expected memory	Expected CPU	Connections
10	30–50 MB	< 5%	10
50	60–100 MB	< 10%	20
100	100–200 MB	< 15%	50
500	300–500 MB	< 25%	100

If your worker exceeds these targets, look for:

Unbounded buffers (accumulating results without processing)
Connection leaks (sessions not closed on error)
Synchronous file I/O blocking the event loop

Anti-Patterns to Avoid

Anti-Pattern	Problem	Fix
`while True` polling without sleep	100% CPU usage	Use `asyncio.sleep()` or `setTimeout()`
Storing all tokens in memory	Unbounded growth	Write to database or file as they arrive
Creating new HTTP client per request	Connection churn, memory waste	Reuse a single session/client
Loading all images at once	Memory spike	Process images one at a time or in small batches
Not closing sessions on shutdown	Connection leaks	Use `try/finally` or process signal handlers

Troubleshooting

Issue	Cause	Fix
Memory climbs over time	Result accumulation or connection leak	Process results immediately; close sessions on error
CPU spikes during polling	Busy-wait loop or JSON parsing overhead	Use async sleep; limit response parsing
Process killed by OS (OOM)	Memory exceeds system limit	Set `maxSockets`, process images in batches
File descriptor limit hit	Too many open connections	Set `ulimit -n 65536` (Linux) or reduce pool size

FAQ

Does CaptchaAI solving use local CPU for computation?

No. The actual CAPTCHA solving happens on CaptchaAI's servers. Your worker only performs HTTP requests and JSON parsing, which are lightweight operations.

Should I use processes or threads for parallelism?

Use async I/O (asyncio for Python, native Promise for Node.js). Threads add memory overhead without benefit for I/O-bound work. Use processes only if you need to exceed 500+ concurrent solves.

How do I detect a memory leak in my worker?

Track RSS and heap used over time. If either grows linearly without plateau, you have a leak. Use tracemalloc (Python) or --inspect (Node.js) to identify the source.

Next Steps

Build resource-efficient CAPTCHA solving workers — get your CaptchaAI API key.

Related guides:

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

Memory and CPU Optimization for CAPTCHA Solving Workers

Common Resource Bottlenecks

Python: Lean Worker Patterns

Use Connection Pooling with Limits

Stream Large Image CAPTCHAs

Monitor Memory Usage

JavaScript: Resource-Efficient Patterns

Proper Agent Configuration

Resource Budgets

Anti-Patterns to Avoid

Troubleshooting

FAQ

Does CaptchaAI solving use local CPU for computation?

Should I use processes or threads for parallelism?

How do I detect a memory leak in my worker?

Next Steps

Discussions (0)

NATS Messaging + CaptchaAI: Lightweight CAPTCHA Task Distribution

Google Cloud Functions + CaptchaAI Integration

RabbitMQ + CaptchaAI: Message Queue Integration

Backpressure Handling in CAPTCHA Solving Queues

Grafana Dashboard Templates for CaptchaAI Metrics

Horizontal Scaling CAPTCHA Solving Workers: When and How

Common Resource Bottlenecks

Python: Lean Worker Patterns

Use Connection Pooling with Limits

Stream Large Image CAPTCHAs

Monitor Memory Usage

JavaScript: Resource-Efficient Patterns

Proper Agent Configuration

Resource Budgets

Anti-Patterns to Avoid

Troubleshooting

FAQ

Does CaptchaAI solving use local CPU for computation?

Should I use processes or threads for parallelism?

How do I detect a memory leak in my worker?

Next Steps

Discussions (0)

Join the conversation

Related Posts

NATS Messaging + CaptchaAI: Lightweight CAPTCHA Task Distribution

Google Cloud Functions + CaptchaAI Integration

RabbitMQ + CaptchaAI: Message Queue Integration

Backpressure Handling in CAPTCHA Solving Queues

Grafana Dashboard Templates for CaptchaAI Metrics

Horizontal Scaling CAPTCHA Solving Workers: When and How