CAPTCHA solving workers are I/O-bound — they spend most of their time waiting for API responses. But poor resource management can still cause memory leaks, high CPU usage, and process crashes. This guide covers practical optimizations for CaptchaAI workers.
Common Resource Bottlenecks
| Bottleneck | Cause | Impact |
|---|---|---|
| Memory growth | Unbounded response buffering | OOM kills, swap thrashing |
| High CPU | Busy-wait polling loops | Waste compute, block other tasks |
| Connection leaks | Unclosed HTTP sessions | File descriptor exhaustion |
| Large payloads | Base64 image bodies in memory | 2–5 MB per image CAPTCHA |
Python: Lean Worker Patterns
Use Connection Pooling with Limits
# lean_worker.py
import os
import asyncio
import aiohttp
API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")
async def create_lean_session():
"""Create a memory-efficient aiohttp session."""
connector = aiohttp.TCPConnector(
limit=20, # Max connections
limit_per_host=20, # All go to same host
keepalive_timeout=30,
enable_cleanup_closed=True,
)
return aiohttp.ClientSession(
connector=connector,
timeout=aiohttp.ClientTimeout(total=30),
)
async def solve_captcha(session, sitekey, pageurl):
"""Solve with minimal memory footprint."""
# Submit
async with session.get("https://ocr.captchaai.com/in.php", params={
"key": API_KEY,
"method": "userrecaptcha",
"googlekey": sitekey,
"pageurl": pageurl,
"json": "1",
}) as resp:
# Read and release response immediately
result = await resp.json(content_type=None)
if result.get("status") != 1:
return None
task_id = result["request"]
del result # Free memory
# Poll with sleep (not busy-wait)
await asyncio.sleep(15)
for _ in range(25):
async with session.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get",
"id": task_id, "json": "1",
}) as resp:
poll_result = await resp.json(content_type=None)
if poll_result.get("status") == 1:
token = poll_result["request"]
del poll_result
return token
if poll_result.get("request") != "CAPCHA_NOT_READY":
return None
del poll_result
await asyncio.sleep(5) # Async sleep — zero CPU
return None
async def main():
session = await create_lean_session()
try:
tasks = [
solve_captcha(session, "SITEKEY", "https://example.com")
for _ in range(50)
]
results = await asyncio.gather(*tasks)
solved = sum(1 for r in results if r)
print(f"Solved: {solved}/{len(tasks)}")
finally:
await session.close()
asyncio.run(main())
Stream Large Image CAPTCHAs
For Image/OCR CAPTCHAs, avoid loading entire images into memory:
import base64
import aiohttp
async def submit_image_streaming(session, image_path):
"""Submit image CAPTCHA without loading entire file into memory."""
# Read file in chunks and encode
with open(image_path, "rb") as f:
image_data = base64.b64encode(f.read()).decode("ascii")
# Submit and immediately release the base64 string
async with session.post("https://ocr.captchaai.com/in.php", data={
"key": API_KEY,
"method": "base64",
"body": image_data,
"json": "1",
}) as resp:
result = await resp.json(content_type=None)
del image_data # Free the base64 string immediately
return result
Monitor Memory Usage
import tracemalloc
tracemalloc.start()
# ... run your solver ...
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.1f} MB")
print(f"Peak: {peak / 1024 / 1024:.1f} MB")
tracemalloc.stop()
JavaScript: Resource-Efficient Patterns
Proper Agent Configuration
// lean_worker.js
const axios = require('axios');
const https = require('https');
const API_KEY = process.env.CAPTCHAAI_KEY || 'YOUR_API_KEY';
// Configure agent for minimal resource usage
const agent = new https.Agent({
keepAlive: true,
maxSockets: 20, // Limit concurrent connections
maxFreeSockets: 5, // Keep 5 idle for reuse
timeout: 30000, // Close idle connections after 30s
});
const api = axios.create({
baseURL: 'https://ocr.captchaai.com',
httpsAgent: agent,
timeout: 30000,
maxContentLength: 50000, // Limit response size (50 KB)
maxBodyLength: 5000000, // Limit request body (5 MB for images)
});
async function solveCaptcha(sitekey, pageurl) {
const submit = await api.get('/in.php', {
params: {
key: API_KEY, method: 'userrecaptcha',
googlekey: sitekey, pageurl, json: '1',
},
});
if (submit.data.status !== 1) return null;
const taskId = submit.data.request;
await new Promise(r => setTimeout(r, 15000));
for (let i = 0; i < 25; i++) {
const poll = await api.get('/res.php', {
params: { key: API_KEY, action: 'get', id: taskId, json: '1' },
});
if (poll.data.status === 1) return poll.data.request;
if (poll.data.request !== 'CAPCHA_NOT_READY') return null;
await new Promise(r => setTimeout(r, 5000));
}
return null;
}
// Process with concurrency control
async function processWithLimit(tasks, concurrency) {
const results = [];
const active = new Set();
for (const task of tasks) {
const p = solveCaptcha(task.sitekey, task.pageurl).then(r => {
active.delete(p);
return r;
});
active.add(p);
results.push(p);
if (active.size >= concurrency) await Promise.race(active);
}
return Promise.all(results);
}
// Monitor memory
function logMemory() {
const usage = process.memoryUsage();
console.log(`RSS: ${(usage.rss / 1024 / 1024).toFixed(1)} MB`);
console.log(`Heap: ${(usage.heapUsed / 1024 / 1024).toFixed(1)} MB`);
}
Resource Budgets
Target resource usage per concurrency level:
| Concurrent solves | Expected memory | Expected CPU | Connections |
|---|---|---|---|
| 10 | 30–50 MB | < 5% | 10 |
| 50 | 60–100 MB | < 10% | 20 |
| 100 | 100–200 MB | < 15% | 50 |
| 500 | 300–500 MB | < 25% | 100 |
If your worker exceeds these targets, look for:
- Unbounded buffers (accumulating results without processing)
- Connection leaks (sessions not closed on error)
- Synchronous file I/O blocking the event loop
Anti-Patterns to Avoid
| Anti-Pattern | Problem | Fix |
|---|---|---|
while True polling without sleep |
100% CPU usage | Use asyncio.sleep() or setTimeout() |
| Storing all tokens in memory | Unbounded growth | Write to database or file as they arrive |
| Creating new HTTP client per request | Connection churn, memory waste | Reuse a single session/client |
| Loading all images at once | Memory spike | Process images one at a time or in small batches |
| Not closing sessions on shutdown | Connection leaks | Use try/finally or process signal handlers |
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| Memory climbs over time | Result accumulation or connection leak | Process results immediately; close sessions on error |
| CPU spikes during polling | Busy-wait loop or JSON parsing overhead | Use async sleep; limit response parsing |
| Process killed by OS (OOM) | Memory exceeds system limit | Set maxSockets, process images in batches |
| File descriptor limit hit | Too many open connections | Set ulimit -n 65536 (Linux) or reduce pool size |
FAQ
Does CaptchaAI solving use local CPU for computation?
No. The actual CAPTCHA solving happens on CaptchaAI's servers. Your worker only performs HTTP requests and JSON parsing, which are lightweight operations.
Should I use processes or threads for parallelism?
Use async I/O (asyncio for Python, native Promise for Node.js). Threads add memory overhead without benefit for I/O-bound work. Use processes only if you need to exceed 500+ concurrent solves.
How do I detect a memory leak in my worker?
Track RSS and heap used over time. If either grows linearly without plateau, you have a leak. Use tracemalloc (Python) or --inspect (Node.js) to identify the source.
Next Steps
Build resource-efficient CAPTCHA solving workers — get your CaptchaAI API key.
Related guides:
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.