Tutorials

Python ThreadPoolExecutor for CAPTCHA Solving Parallelism

asyncio is powerful but requires rewriting your entire call chain as async. ThreadPoolExecutor gives you parallelism with standard synchronous code — drop it into existing projects without restructuring.

Why ThreadPoolExecutor for CAPTCHAs

CAPTCHA solving is I/O-bound (waiting for HTTP responses). Python threads release the GIL during I/O operations, making ThreadPoolExecutor efficient for this workload:

Approach Complexity Fits existing code Parallelism for I/O
Sequential None Yes None
ThreadPoolExecutor Low Yes Good
asyncio High Requires async rewrite Best
multiprocessing Medium Mostly Overkill for I/O

Basic Implementation

import os
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests

API_KEY = os.environ["CAPTCHAAI_API_KEY"]


def solve_captcha(sitekey, pageurl):
    """Synchronous CAPTCHA solve — submit and poll."""
    # Submit
    resp = requests.post("https://ocr.captchaai.com/in.php", data={
        "key": API_KEY,
        "method": "userrecaptcha",
        "googlekey": sitekey,
        "pageurl": pageurl,
        "json": 1
    })
    data = resp.json()

    if data.get("status") != 1:
        raise RuntimeError(data.get("request", "Submit failed"))

    captcha_id = data["request"]

    # Poll for result
    for _ in range(60):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY,
            "action": "get",
            "id": captcha_id,
            "json": 1
        }).json()

        if result.get("status") == 1:
            return result["request"]
        if result.get("request") != "CAPCHA_NOT_READY":
            raise RuntimeError(result.get("request", "Unknown error"))

    raise TimeoutError("Solve timeout after 300s")


# Batch solve with ThreadPoolExecutor
tasks = [
    {"sitekey": "6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-", "pageurl": f"https://example.com/page/{i}"}
    for i in range(20)
]

start = time.time()

with ThreadPoolExecutor(max_workers=10) as executor:
    futures = {
        executor.submit(solve_captcha, t["sitekey"], t["pageurl"]): t
        for t in tasks
    }

    solved = 0
    failed = 0

    for future in as_completed(futures):
        task = futures[future]
        try:
            solution = future.result()
            solved += 1
            print(f"[OK] {task['pageurl']}: {solution[:30]}...")
        except Exception as e:
            failed += 1
            print(f"[ERR] {task['pageurl']}: {e}")

elapsed = time.time() - start
print(f"\nDone: {solved} solved, {failed} failed in {elapsed:.1f}s")

Using Session for Connection Reuse

Creating a new TCP connection per request wastes time. Share a requests.Session per thread:

import threading

# Thread-local storage for sessions
thread_local = threading.local()


def get_session():
    """Get or create a thread-local session."""
    if not hasattr(thread_local, "session"):
        thread_local.session = requests.Session()
        # Configure connection pooling
        adapter = requests.adapters.HTTPAdapter(
            pool_connections=10,
            pool_maxsize=10,
            max_retries=2
        )
        thread_local.session.mount("https://", adapter)
    return thread_local.session


def solve_captcha_pooled(sitekey, pageurl):
    """Solve using thread-local connection pooling."""
    session = get_session()

    resp = session.post("https://ocr.captchaai.com/in.php", data={
        "key": API_KEY,
        "method": "userrecaptcha",
        "googlekey": sitekey,
        "pageurl": pageurl,
        "json": 1
    })
    data = resp.json()

    if data.get("status") != 1:
        raise RuntimeError(data.get("request"))

    captcha_id = data["request"]

    for _ in range(60):
        time.sleep(5)
        result = session.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY,
            "action": "get",
            "id": captcha_id,
            "json": 1
        }).json()

        if result.get("status") == 1:
            return result["request"]
        if result.get("request") != "CAPCHA_NOT_READY":
            raise RuntimeError(result.get("request"))

    raise TimeoutError("Solve timeout")

map() for Simple Batch Operations

When you don't need per-task error handling:

def solve_task(task):
    """Wrapper that returns result dict."""
    try:
        solution = solve_captcha_pooled(task["sitekey"], task["pageurl"])
        return {"url": task["pageurl"], "solution": solution, "error": None}
    except Exception as e:
        return {"url": task["pageurl"], "solution": None, "error": str(e)}


with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(solve_task, tasks))

solved = [r for r in results if r["solution"]]
failed = [r for r in results if r["error"]]
print(f"Solved: {len(solved)}, Failed: {len(failed)}")

Timeout Protection

Prevent runaway threads from blocking your pool:

from concurrent.futures import TimeoutError as FuturesTimeout

with ThreadPoolExecutor(max_workers=10) as executor:
    futures = {
        executor.submit(solve_captcha_pooled, t["sitekey"], t["pageurl"]): t
        for t in tasks
    }

    for future in as_completed(futures, timeout=600):  # 10 min global timeout
        task = futures[future]
        try:
            solution = future.result(timeout=120)  # 2 min per task
            print(f"[OK] {task['pageurl']}")
        except FuturesTimeout:
            print(f"[TIMEOUT] {task['pageurl']}")
        except Exception as e:
            print(f"[ERR] {task['pageurl']}: {e}")

Progress Callback

Track completion in real-time:

import threading

progress_lock = threading.Lock()
progress = {"done": 0, "total": 0}


def solve_with_progress(task):
    result = solve_task(task)
    with progress_lock:
        progress["done"] += 1
        pct = progress["done"] / progress["total"] * 100
        print(f'\r  Progress: {progress["done"]}/{progress["total"]} ({pct:.0f}%)', end="")
    return result


progress["total"] = len(tasks)

with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(solve_with_progress, tasks))

print()  # Newline after progress

Choosing max_workers

Workers Concurrent solves Overhead Best for
5 5 Very low Small batches, conservative use
10 10 Low General use
25 25 Moderate High-volume pipelines
50 50 Higher Maximum throughput

More workers means more concurrent API connections. Start at 10, increase while monitoring error rates.

ThreadPoolExecutor vs asyncio

# ThreadPoolExecutor — drop into existing sync code
with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(solve_task, tasks))

# asyncio — requires async function chain
async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [solve_async(session, t) for t in task_list]
        results = await asyncio.gather(*tasks)

Use ThreadPoolExecutor when:

  • Your existing codebase is synchronous
  • You use libraries that don't support async (Selenium, some ORMs)
  • You want quick parallelism without restructuring

Use asyncio when:

  • Building from scratch
  • Maximum efficiency matters (fewer OS threads)
  • Already in an async framework (FastAPI, aiohttp)

Troubleshooting

Issue Cause Fix
All threads blocked Every thread waiting on time.sleep during polling This is expected — threads release GIL during sleep
ConnectionError spikes Too many concurrent connections Reduce max_workers; use connection pooling
Results out of order as_completed returns in completion order Use map() for ordered results, or track with dict
Memory growing Large result objects held in futures Process results in as_completed loop; don't store all

FAQ

Does the GIL prevent real parallelism?

No — for I/O-bound work like HTTP requests and time.sleep, Python releases the GIL. Your threads run truly concurrent during network calls. The GIL only limits CPU-bound parallelism.

How many CAPTCHAs can ThreadPoolExecutor handle per hour?

With 10 workers and 15-second average solve time: ~2,400 per hour. With 25 workers: ~6,000 per hour. The bottleneck is CaptchaAI solve time, not Python threading.

Should I use ProcessPoolExecutor instead?

No. CAPTCHA solving is I/O-bound. ProcessPoolExecutor adds inter-process communication overhead with no benefit. Stick with threads.

Next Steps

Parallelize CAPTCHA solving — get your CaptchaAI API key and drop ThreadPoolExecutor into your pipeline.

Related guides:

Discussions (0)

No comments yet.

Related Posts

Reference CAPTCHA Solving Performance by Region: Latency Analysis
Analyze how geographic region affects Captcha AI solve times — network latency, proxy location, and optimization strategies for global deployments.

Analyze how geographic region affects Captcha AI solve times — network latency, proxy location, and optimizati...

Automation Python All CAPTCHA Types
Apr 05, 2026
Explainers Rate Limiting CAPTCHA Solving Workflows
Sending too many requests too fast triggers blocks, bans, and wasted CAPTCHA solves.

Sending too many requests too fast triggers blocks, bans, and wasted CAPTCHA solves. Smart rate limiting keeps...

Automation Python All CAPTCHA Types
Apr 04, 2026
DevOps & Scaling Horizontal Scaling CAPTCHA Solving Workers: When and How
Scale CAPTCHA solving horizontally — identify bottlenecks, add workers dynamically, auto-scale based on queue depth, and manage costs with Captcha AI.

Scale CAPTCHA solving horizontally — identify bottlenecks, add workers dynamically, auto-scale based on queue...

Automation Python All CAPTCHA Types
Mar 07, 2026
Tutorials Rate Limiting Your Own CAPTCHA Solving Requests
Implement client-side rate limiting for Captcha AI API calls — token bucket, sliding window, and per-key limits to prevent overuse and control costs.

Implement client-side rate limiting for Captcha AI API calls — token bucket, sliding window, and per-key limit...

Automation Python All CAPTCHA Types
Feb 26, 2026
Tutorials Testing CaptchaAI Before Full Migration: Parallel Run Guide
Run your existing CAPTCHA provider alongside Captcha AI in parallel — compare solve rates, speed, and cost before committing to a full migration.

Run your existing CAPTCHA provider alongside Captcha AI in parallel — compare solve rates, speed, and cost bef...

Automation Python All CAPTCHA Types
Feb 02, 2026
API Tutorials Semaphore Patterns for CAPTCHA Concurrency Control
Use semaphores to control concurrent CAPTCHA API calls — prevent rate limiting and manage resource usage in Python and Node.js.

Use semaphores to control concurrent CAPTCHA API calls — prevent rate limiting and manage resource usage in Py...

Automation Python All CAPTCHA Types
Jan 26, 2026
Explainers DNS Resolution Impact on CAPTCHA API Performance
Understand how DNS resolution affects CAPTCHA API call latency and to optimize with DNS caching, pre-resolution, and DNS-over-HTTPS.

Understand how DNS resolution affects CAPTCHA API call latency and learn to optimize with DNS caching, pre-res...

Automation Python All CAPTCHA Types
Apr 03, 2026
Troubleshooting CaptchaAI API Rate Limiting: Handling 429 Responses
Handle Captcha AI API rate limits and 429 responses.

Handle Captcha AI API rate limits and 429 responses. Implement exponential backoff, request throttling, and qu...

Automation Python All CAPTCHA Types
Apr 01, 2026
DevOps & Scaling Auto-Scaling CAPTCHA Solving Workers
Build auto-scaling CAPTCHA solving workers that adjust capacity based on queue depth, balance, and solve rates.

Build auto-scaling CAPTCHA solving workers that adjust capacity based on queue depth, balance, and solve rates...

Automation Python All CAPTCHA Types
Mar 23, 2026
Tutorials CAPTCHA Solving Throughput: How to Process 10,000 Tasks per Hour
Architect a CAPTCHA solving pipeline that processes 10,000 tasks per hour using Captcha AI with async Python, connection pooling, and queue-based distribution.

Architect a CAPTCHA solving pipeline that processes 10,000 tasks per hour using Captcha AI with async Python,...

Automation Python All CAPTCHA Types
Mar 13, 2026
Tutorials CAPTCHA Handling in Flask Applications with CaptchaAI
Integrate Captcha AI into Flask applications for automated CAPTCHA solving.

Integrate Captcha AI into Flask applications for automated CAPTCHA solving. Includes service class, API endpoi...

Automation Cloudflare Turnstile
Mar 17, 2026
Tutorials Streaming Batch Results: Processing CAPTCHA Solutions as They Arrive
Process CAPTCHA solutions the moment they arrive instead of waiting for tasks to complete — use async generators, event emitters, and callback patterns for stre...

Process CAPTCHA solutions the moment they arrive instead of waiting for all tasks to complete — use async gene...

Automation Python All CAPTCHA Types
Apr 07, 2026