Profiling CAPTCHA Solving Bottlenecks in Python Applications

When your CAPTCHA solving script is slower than expected, you need to know where the time goes. Is it network latency? JSON parsing? Image encoding? This guide shows how to profile CaptchaAI integrations in Python to find and fix the actual bottleneck.

Time Budget for a Single Solve

A typical reCAPTCHA v2 solve breaks down like this:

Phase	Expected Time	What's Happening
Submit request	50–200ms	HTTP call to `in.php`
CaptchaAI processing	10–25s	Solving on CaptchaAI servers
Poll requests (3–5 calls)	150–500ms	HTTP calls to `res.php`
JSON parsing	< 1ms	Deserializing responses
Your code (between calls)	Variable	Business logic, DB writes
Total	~12–30s

If your total exceeds 45 seconds consistently, something in your pipeline is adding overhead.

Method 1: Manual Timing Instrumentation

Add timing to each phase of the solve:

# profiled_solver.py
import os
import time
import requests

API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")

def solve_with_timing(sitekey, pageurl):
    """Solve with detailed timing for each phase."""
    timings = {}
    session = requests.Session()

    # Phase 1: Submit
    t0 = time.perf_counter()
    resp = session.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY,
        "method": "userrecaptcha",
        "googlekey": sitekey,
        "pageurl": pageurl,
        "json": "1",
    })
    timings["submit_request"] = time.perf_counter() - t0

    t0 = time.perf_counter()
    result = resp.json()
    timings["submit_parse"] = time.perf_counter() - t0

    if result.get("status") != 1:
        return None, timings

    task_id = result["request"]

    # Phase 2: Wait
    t0 = time.perf_counter()
    time.sleep(15)
    timings["initial_wait"] = time.perf_counter() - t0

    # Phase 3: Poll
    poll_times = []
    poll_count = 0
    t_poll_start = time.perf_counter()

    for _ in range(25):
        t0 = time.perf_counter()
        poll = session.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get",
            "id": task_id, "json": "1",
        })
        poll_result = poll.json()
        poll_time = time.perf_counter() - t0
        poll_times.append(poll_time)
        poll_count += 1

        if poll_result.get("status") == 1:
            break
        if poll_result.get("request") != "CAPCHA_NOT_READY":
            break
        time.sleep(5)

    timings["poll_total"] = time.perf_counter() - t_poll_start
    timings["poll_count"] = poll_count
    timings["poll_avg_request"] = sum(poll_times) / len(poll_times) if poll_times else 0
    timings["total"] = sum(v for k, v in timings.items() if isinstance(v, float))

    token = poll_result.get("request") if poll_result.get("status") == 1 else None
    return token, timings

# Run and display results
token, timings = solve_with_timing(
    "6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-",
    "https://www.google.com/recaptcha/api2/demo"
)

print("\n=== Timing Breakdown ===")
for key, value in timings.items():
    if isinstance(value, float):
        print(f"  {key}: {value*1000:.1f}ms")
    else:
        print(f"  {key}: {value}")

Expected output:

=== Timing Breakdown ===
  submit_request: 145.3ms
  submit_parse: 0.2ms
  initial_wait: 15001.2ms
  poll_total: 10234.5ms
  poll_count: 3
  poll_avg_request: 67.8ms
  total: 25381.2ms

Method 2: cProfile for Call Stack Analysis

import cProfile
import pstats

def run_solver():
    """Wrapper for profiling."""
    solve_with_timing(
        "6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-",
        "https://www.google.com/recaptcha/api2/demo"
    )

# Profile the entire solve
profiler = cProfile.Profile()
profiler.enable()
run_solver()
profiler.disable()

# Show top 20 time-consuming functions
stats = pstats.Stats(profiler)
stats.sort_stats("cumulative")
stats.print_stats(20)

This reveals whether time is spent in:

socket.recv (network I/O — expected)
json.loads (JSON parsing — should be < 1ms)
ssl.read (TLS — expected for HTTPS)
Your own functions (business logic — optimize here)

Method 3: Async Profiling for Concurrent Solvers

For asyncio-based solvers, standard profiling doesn't work well. Use timing decorators:

import asyncio
import functools
import time
from collections import defaultdict

# Timing decorator for async functions
timing_data = defaultdict(list)

def timed_async(func):
    @functools.wraps(func)
    async def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = await func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        timing_data[func.__name__].append(elapsed)
        return result
    return wrapper

@timed_async
async def submit_captcha(session, sitekey, pageurl):
    """Submit with timing."""
    import aiohttp
    API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")
    async with session.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY, "method": "userrecaptcha",
        "googlekey": sitekey, "pageurl": pageurl, "json": "1",
    }) as resp:
        return await resp.json(content_type=None)

@timed_async
async def poll_result(session, task_id):
    """Poll with timing."""
    import aiohttp
    API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")
    async with session.get("https://ocr.captchaai.com/res.php", params={
        "key": API_KEY, "action": "get",
        "id": task_id, "json": "1",
    }) as resp:
        return await resp.json(content_type=None)

# After running, print statistics
def print_timing_stats():
    import statistics
    for func_name, times in timing_data.items():
        print(f"\n{func_name}:")
        print(f"  Calls: {len(times)}")
        print(f"  Median: {statistics.median(times)*1000:.1f}ms")
        print(f"  Max: {max(times)*1000:.1f}ms")
        print(f"  Total: {sum(times)*1000:.1f}ms")

Common Bottlenecks and Fixes

Bottleneck	How to Detect	Fix
High `submit_request` time (> 500ms)	Manual timing shows slow submit	Check DNS, use keep-alive
High poll count (> 8 polls)	`poll_count` consistently high	Increase initial wait time
Slow JSON parsing	`submit_parse` > 10ms	Shouldn't happen; check response size
Time between polls > 5s	Gap between poll end and next poll start	Verify no blocking code between polls
Image encoding bottleneck	Large `base64.b64encode` time	Pre-encode or stream images
Database writes blocking solver	cProfile shows DB function time	Make DB writes async or batch

Troubleshooting

Issue	Cause	Fix
Total time 2x expected	Business logic between API calls	Profile to find the slow function
First solve slow, rest fast	Connection setup (DNS + TLS)	Use `Session` with keep-alive
Memory growing during profiling	Profiler accumulating data	Use sampling profiler for long runs
Profiling changes timing	Profiler overhead	Use `time.perf_counter()` for production

FAQ

Does profiling affect solve accuracy?

No. Profiling only measures execution timing. It doesn't change the API calls or CAPTCHA solving behavior.

Should I profile in production?

Use lightweight timing (Method 1) in production. Avoid cProfile in production as it adds CPU overhead. Sample periodically instead.

What's the minimum useful sample size for profiling?

Profile at least 10 solves to get meaningful statistics. Single-solve profiling is too noisy due to network variability.

Next Steps

Profile your CAPTCHA pipeline and eliminate bottlenecks — get your CaptchaAI API key.

Related guides:

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

Profiling CAPTCHA Solving Bottlenecks in Python Applications

Time Budget for a Single Solve

Method 1: Manual Timing Instrumentation

Method 2: cProfile for Call Stack Analysis

Method 3: Async Profiling for Concurrent Solvers

Common Bottlenecks and Fixes

Troubleshooting

FAQ

Does profiling affect solve accuracy?

Should I profile in production?

What's the minimum useful sample size for profiling?

Next Steps

Discussions (0)

Discord Webhook Alerts for CAPTCHA Pipeline Status

Why CAPTCHA Tokens Work in the API but Fail in the Browser

Python ThreadPoolExecutor for CAPTCHA Solving Parallelism

Migrate from CapSolver to CaptchaAI Step by Step

NATS Messaging + CaptchaAI: Lightweight CAPTCHA Task Distribution

Rate Limiting CAPTCHA Solving Workflows

Time Budget for a Single Solve

Method 1: Manual Timing Instrumentation

Method 2: cProfile for Call Stack Analysis

Method 3: Async Profiling for Concurrent Solvers

Common Bottlenecks and Fixes

Troubleshooting

FAQ

Does profiling affect solve accuracy?

Should I profile in production?

What's the minimum useful sample size for profiling?

Next Steps

Discussions (0)

Join the conversation

Related Posts

Discord Webhook Alerts for CAPTCHA Pipeline Status

Why CAPTCHA Tokens Work in the API but Fail in the Browser

Python ThreadPoolExecutor for CAPTCHA Solving Parallelism

Migrate from CapSolver to CaptchaAI Step by Step

NATS Messaging + CaptchaAI: Lightweight CAPTCHA Task Distribution

Rate Limiting CAPTCHA Solving Workflows