When your CAPTCHA solving script is slower than expected, you need to know where the time goes. Is it network latency? JSON parsing? Image encoding? This guide shows how to profile CaptchaAI integrations in Python to find and fix the actual bottleneck.
Time Budget for a Single Solve
A typical reCAPTCHA v2 solve breaks down like this:
| Phase | Expected Time | What's Happening |
|---|---|---|
| Submit request | 50–200ms | HTTP call to in.php |
| CaptchaAI processing | 10–25s | Solving on CaptchaAI servers |
| Poll requests (3–5 calls) | 150–500ms | HTTP calls to res.php |
| JSON parsing | < 1ms | Deserializing responses |
| Your code (between calls) | Variable | Business logic, DB writes |
| Total | ~12–30s |
If your total exceeds 45 seconds consistently, something in your pipeline is adding overhead.
Method 1: Manual Timing Instrumentation
Add timing to each phase of the solve:
# profiled_solver.py
import os
import time
import requests
API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")
def solve_with_timing(sitekey, pageurl):
"""Solve with detailed timing for each phase."""
timings = {}
session = requests.Session()
# Phase 1: Submit
t0 = time.perf_counter()
resp = session.get("https://ocr.captchaai.com/in.php", params={
"key": API_KEY,
"method": "userrecaptcha",
"googlekey": sitekey,
"pageurl": pageurl,
"json": "1",
})
timings["submit_request"] = time.perf_counter() - t0
t0 = time.perf_counter()
result = resp.json()
timings["submit_parse"] = time.perf_counter() - t0
if result.get("status") != 1:
return None, timings
task_id = result["request"]
# Phase 2: Wait
t0 = time.perf_counter()
time.sleep(15)
timings["initial_wait"] = time.perf_counter() - t0
# Phase 3: Poll
poll_times = []
poll_count = 0
t_poll_start = time.perf_counter()
for _ in range(25):
t0 = time.perf_counter()
poll = session.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get",
"id": task_id, "json": "1",
})
poll_result = poll.json()
poll_time = time.perf_counter() - t0
poll_times.append(poll_time)
poll_count += 1
if poll_result.get("status") == 1:
break
if poll_result.get("request") != "CAPCHA_NOT_READY":
break
time.sleep(5)
timings["poll_total"] = time.perf_counter() - t_poll_start
timings["poll_count"] = poll_count
timings["poll_avg_request"] = sum(poll_times) / len(poll_times) if poll_times else 0
timings["total"] = sum(v for k, v in timings.items() if isinstance(v, float))
token = poll_result.get("request") if poll_result.get("status") == 1 else None
return token, timings
# Run and display results
token, timings = solve_with_timing(
"6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-",
"https://www.google.com/recaptcha/api2/demo"
)
print("\n=== Timing Breakdown ===")
for key, value in timings.items():
if isinstance(value, float):
print(f" {key}: {value*1000:.1f}ms")
else:
print(f" {key}: {value}")
Expected output:
=== Timing Breakdown ===
submit_request: 145.3ms
submit_parse: 0.2ms
initial_wait: 15001.2ms
poll_total: 10234.5ms
poll_count: 3
poll_avg_request: 67.8ms
total: 25381.2ms
Method 2: cProfile for Call Stack Analysis
import cProfile
import pstats
def run_solver():
"""Wrapper for profiling."""
solve_with_timing(
"6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-",
"https://www.google.com/recaptcha/api2/demo"
)
# Profile the entire solve
profiler = cProfile.Profile()
profiler.enable()
run_solver()
profiler.disable()
# Show top 20 time-consuming functions
stats = pstats.Stats(profiler)
stats.sort_stats("cumulative")
stats.print_stats(20)
This reveals whether time is spent in:
socket.recv(network I/O — expected)json.loads(JSON parsing — should be < 1ms)ssl.read(TLS — expected for HTTPS)- Your own functions (business logic — optimize here)
Method 3: Async Profiling for Concurrent Solvers
For asyncio-based solvers, standard profiling doesn't work well. Use timing decorators:
import asyncio
import functools
import time
from collections import defaultdict
# Timing decorator for async functions
timing_data = defaultdict(list)
def timed_async(func):
@functools.wraps(func)
async def wrapper(*args, **kwargs):
start = time.perf_counter()
result = await func(*args, **kwargs)
elapsed = time.perf_counter() - start
timing_data[func.__name__].append(elapsed)
return result
return wrapper
@timed_async
async def submit_captcha(session, sitekey, pageurl):
"""Submit with timing."""
import aiohttp
API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")
async with session.get("https://ocr.captchaai.com/in.php", params={
"key": API_KEY, "method": "userrecaptcha",
"googlekey": sitekey, "pageurl": pageurl, "json": "1",
}) as resp:
return await resp.json(content_type=None)
@timed_async
async def poll_result(session, task_id):
"""Poll with timing."""
import aiohttp
API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")
async with session.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get",
"id": task_id, "json": "1",
}) as resp:
return await resp.json(content_type=None)
# After running, print statistics
def print_timing_stats():
import statistics
for func_name, times in timing_data.items():
print(f"\n{func_name}:")
print(f" Calls: {len(times)}")
print(f" Median: {statistics.median(times)*1000:.1f}ms")
print(f" Max: {max(times)*1000:.1f}ms")
print(f" Total: {sum(times)*1000:.1f}ms")
Common Bottlenecks and Fixes
| Bottleneck | How to Detect | Fix |
|---|---|---|
High submit_request time (> 500ms) |
Manual timing shows slow submit | Check DNS, use keep-alive |
| High poll count (> 8 polls) | poll_count consistently high |
Increase initial wait time |
| Slow JSON parsing | submit_parse > 10ms |
Shouldn't happen; check response size |
| Time between polls > 5s | Gap between poll end and next poll start | Verify no blocking code between polls |
| Image encoding bottleneck | Large base64.b64encode time |
Pre-encode or stream images |
| Database writes blocking solver | cProfile shows DB function time | Make DB writes async or batch |
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| Total time 2x expected | Business logic between API calls | Profile to find the slow function |
| First solve slow, rest fast | Connection setup (DNS + TLS) | Use Session with keep-alive |
| Memory growing during profiling | Profiler accumulating data | Use sampling profiler for long runs |
| Profiling changes timing | Profiler overhead | Use time.perf_counter() for production |
FAQ
Does profiling affect solve accuracy?
No. Profiling only measures execution timing. It doesn't change the API calls or CAPTCHA solving behavior.
Should I profile in production?
Use lightweight timing (Method 1) in production. Avoid cProfile in production as it adds CPU overhead. Sample periodically instead.
What's the minimum useful sample size for profiling?
Profile at least 10 solves to get meaningful statistics. Single-solve profiling is too noisy due to network variability.
Next Steps
Profile your CAPTCHA pipeline and eliminate bottlenecks — get your CaptchaAI API key.
Related guides:
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.