Proxy rotation reduces CAPTCHA frequency by distributing requests across multiple IPs. Combined with CaptchaAI for solving the CAPTCHAs that still appear, you get a reliable scraping pipeline that handles any anti-bot system.
Why Proxy Rotation Reduces CAPTCHAs
Sites trigger CAPTCHAs based on per-IP request patterns:
| Factor | Single IP | Rotating Proxies |
|---|---|---|
| Requests per minute | 10+ triggers CAPTCHA | Distributed across IPs |
| IP reputation | Degrades over time | Fresh IPs from pool |
| Session patterns | Suspicious patterns visible | Patterns spread across IPs |
| Geographic consistency | Single location | Natural geographic diversity |
Proxy Types for Scraping
| Type | Best For | CAPTCHA Rate | Cost |
|---|---|---|---|
| Residential | High-value targets (Google, Amazon) | Lowest | $$$ |
| Mobile | Ultra-low detection | Lowest | $$$$ |
| ISP/Static | Sustained sessions | Low | $$ |
| Datacenter | High-volume, lenient sites | Higher | $ |
Recommendation: Use residential proxies for sites with aggressive CAPTCHA triggers. Datacenter proxies work for less protected sites.
Basic Proxy Rotation (Python)
import requests
import random
import time
PROXIES = [
"http://user:pass@proxy1.example.com:8080",
"http://user:pass@proxy2.example.com:8080",
"http://user:pass@proxy3.example.com:8080",
]
API_KEY = "YOUR_API_KEY"
def get_random_proxy():
proxy = random.choice(PROXIES)
return {"http": proxy, "https": proxy}
def scrape_with_rotation(url):
proxy = get_random_proxy()
session = requests.Session()
session.proxies = proxy
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})
resp = session.get(url)
# If CAPTCHA appears, solve it
if "g-recaptcha" in resp.text or "captcha" in resp.text.lower():
from bs4 import BeautifulSoup
soup = BeautifulSoup(resp.text, "html.parser")
rc = soup.find("div", class_="g-recaptcha")
if rc:
site_key = rc["data-sitekey"]
token = solve_captcha(site_key, url)
resp = session.post(url, data={"g-recaptcha-response": token})
return resp.text
def solve_captcha(site_key, page_url):
resp = requests.get("https://ocr.captchaai.com/in.php", params={
"key": API_KEY, "method": "userrecaptcha",
"googlekey": site_key, "pageurl": page_url
})
task_id = resp.text.split("|")[1]
for _ in range(60):
time.sleep(5)
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get", "id": task_id
})
if result.text == "CAPCHA_NOT_READY": continue
if result.text.startswith("OK|"): return result.text.split("|")[1]
raise Exception(result.text)
raise TimeoutError()
Smart Proxy Rotation
Track which proxies trigger CAPTCHAs and avoid them:
from collections import defaultdict
import random
class SmartProxyRotator:
def __init__(self, proxies):
self.proxies = proxies
self.captcha_count = defaultdict(int)
self.success_count = defaultdict(int)
def get_proxy(self):
# Prefer proxies with lower CAPTCHA rates
scored = []
for proxy in self.proxies:
total = self.captcha_count[proxy] + self.success_count[proxy]
if total == 0:
score = 0.5 # Unknown proxy, neutral score
else:
score = self.success_count[proxy] / total
scored.append((proxy, score))
# Weight selection by score
scored.sort(key=lambda x: x[1], reverse=True)
top_proxies = scored[:max(len(scored) // 2, 1)]
proxy = random.choice(top_proxies)[0]
return proxy
def report_success(self, proxy):
self.success_count[proxy] += 1
def report_captcha(self, proxy):
self.captcha_count[proxy] += 1
# Usage
rotator = SmartProxyRotator(PROXIES)
def scrape(url):
proxy = rotator.get_proxy()
resp = requests.get(url, proxies={"http": proxy, "https": proxy})
if "captcha" in resp.text.lower():
rotator.report_captcha(proxy)
# Solve CAPTCHA...
else:
rotator.report_success(proxy)
return resp.text
Proxy Rotation with Selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def create_driver_with_proxy(proxy_url):
options = Options()
options.add_argument(f"--proxy-server={proxy_url}")
options.add_argument("--disable-blink-features=AutomationControlled")
return webdriver.Chrome(options=options)
# Rotate proxy per session
proxy = random.choice(PROXIES)
driver = create_driver_with_proxy(proxy)
driver.get("https://example.com")
Proxy + CAPTCHA Solving for Cloudflare
Cloudflare Challenge solving requires passing a proxy to CaptchaAI:
proxy = "http://user:pass@proxy.example.com:8080"
resp = requests.get("https://ocr.captchaai.com/in.php", params={
"key": API_KEY,
"method": "cloudflare_challenge",
"pageurl": "https://example.com",
"proxy": proxy,
"proxytype": "HTTP"
})
task_id = resp.text.split("|")[1]
# Poll for cf_clearance cookie
# Use the same proxy for subsequent requests
Best Practices
- Match proxy geo to target — Use US proxies for US sites
- One session per proxy — Don't reuse sessions across different proxies
- Rate limit per proxy — Max 5-10 requests/minute per IP
- Monitor CAPTCHA rates — Track which proxies trigger more CAPTCHAs
- Use sticky sessions — Keep the same proxy for multi-step workflows
- Handle proxy failures — Retry with a different proxy on connection errors
Troubleshooting
| Issue | Fix |
|---|---|
| All proxies trigger CAPTCHAs | Switch to residential proxies; reduce rate |
| Proxy timeout errors | Remove slow proxies from pool; increase timeout |
| Different content per proxy | Some sites serve geo-specific content; normalize |
| CAPTCHA tokens don't work with proxy | Ensure token is used from the same session/IP |
FAQ
Do I need proxies if I use CaptchaAI?
Not strictly — CaptchaAI can solve CAPTCHAs regardless. But proxies reduce how often CAPTCHAs appear, saving time and API costs.
Should I use the same proxy for CAPTCHA solving and scraping?
For most CAPTCHA types, the token is valid regardless of IP. For Cloudflare Challenge, you must use the same proxy since the cf_clearance cookie is IP-bound.
How many proxies do I need?
For moderate scraping (1,000 pages/day), 10-20 rotating residential proxies suffice. For high volume, use a proxy provider with automatic rotation.
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.