Use Cases

Reducing CAPTCHA Interruptions in Web Scraping

Every CAPTCHA solved costs time and money. These techniques reduce how often CAPTCHAs appear during scraping — and CaptchaAI handles the ones that still get through.

Prevention Techniques

1. Use Residential Proxies

Datacenter IPs trigger CAPTCHAs 5-10x more often than residential IPs:

# Residential proxy rotation
proxies = {
    "http": "http://user:pass@residential-proxy.example.com:8080",
    "https": "http://user:pass@residential-proxy.example.com:8080"
}
resp = requests.get(url, proxies=proxies)

2. Implement Request Delays

import random
import time

# Random delay between 3-8 seconds
time.sleep(random.uniform(3, 8))

Sites track request timing. Consistent intervals (exactly every 1 second) are a strong bot signal. Random delays mimic human behavior.

3. Set Realistic Headers

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Referer": "https://www.google.com/",
    "DNT": "1",
    "Connection": "keep-alive"
}

4. Maintain Session Cookies

session = requests.Session()

# Visit homepage first to establish cookies
session.get("https://example.com")
time.sleep(2)

# Then access target pages
session.get("https://example.com/data")

Sites expect returning visitors to have cookie history. A fresh session hitting deep pages is suspicious.

5. Use Referrer Chains

# Navigate like a human: search → results → detail
session.get("https://example.com")
time.sleep(2)
session.get("https://example.com/search?q=product", headers={"Referer": "https://example.com"})
time.sleep(3)
session.get("https://example.com/product/123", headers={"Referer": "https://example.com/search?q=product"})

6. Lower Concurrency

Concurrency CAPTCHA Rate Speed
1 thread Lowest Slow
3 threads Low Moderate
10 threads High Fast
50 threads Very high Fast but blocked

Start with 1-3 concurrent scrapers per site.

7. Use APIs When Available

Many sites offer public APIs that don't require CAPTCHA solving:

Site API Available Notes
Amazon Product Advertising API Requires approval
Google Custom Search API 100 free/day
Twitter/X API v2 Paid tiers
Reddit Reddit API Free with app registration

Check if your target has an API before building a scraper.

8. Scrape During Off-Peak Hours

Sites are less aggressive with bot detection during low-traffic periods (late night, weekends). Rate limits may be higher and monitoring less strict.

When Prevention Fails: CaptchaAI

No prevention technique eliminates CAPTCHAs entirely. At scale, you need both prevention and solving:

import requests
import time

API_KEY = "YOUR_API_KEY"

def scrape_with_fallback(url, session):
    resp = session.get(url)

    # If CAPTCHA appears, solve it
    if "g-recaptcha" in resp.text:
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(resp.text, "html.parser")
        site_key = soup.find("div", class_="g-recaptcha")["data-sitekey"]

        # Solve via CaptchaAI
        submit = requests.get("https://ocr.captchaai.com/in.php", params={
            "key": API_KEY, "method": "userrecaptcha",
            "googlekey": site_key, "pageurl": url
        })
        task_id = submit.text.split("|")[1]

        for _ in range(60):
            time.sleep(5)
            result = requests.get("https://ocr.captchaai.com/res.php", params={
                "key": API_KEY, "action": "get", "id": task_id
            })
            if result.text == "CAPCHA_NOT_READY": continue
            if result.text.startswith("OK|"):
                token = result.text.split("|")[1]
                resp = session.post(url, data={"g-recaptcha-response": token})
                break

    return resp.text

Cost Impact of Prevention

Good prevention techniques reduce CaptchaAI usage significantly:

Approach CAPTCHAs per 1K pages Cost
No prevention ~200-500 $0.20-0.50
Basic headers + delays ~50-100 $0.05-0.10
Residential proxies + headers ~10-30 $0.01-0.03
Full stealth setup ~5-15 $0.005-0.015

Investing in prevention pays for itself through lower CAPTCHA solving costs.

FAQ

What's the single most effective technique?

Residential proxy rotation. It addresses the most common trigger (IP reputation) and works across all sites.

Do I still need CaptchaAI if I use all these techniques?

Yes, for production reliability. Prevention reduces CAPTCHAs but doesn't eliminate them. CaptchaAI ensures your scraper never gets stuck on an unsolved CAPTCHA.

How do I know which technique helps most for my target site?

Monitor your CAPTCHA rate. Add techniques one at a time and measure the reduction. Start with proxies and headers as they have the highest impact.

Discussions (0)

No comments yet.

Related Posts

Tutorials Extracting reCAPTCHA Parameters from Page Source
Extract re CAPTCHA parameters from any web page — sitekey, action, data-s, enterprise flag, and version — using regex, DOM queries, and network interception.

Extract all re CAPTCHA parameters from any web page — sitekey, action, data-s, enterprise flag, and version —...

Python reCAPTCHA v2 Web Scraping
Apr 07, 2026
Use Cases Job Board Scraping with CAPTCHA Handling Using CaptchaAI
Scrape job listings from Indeed, Linked In, Glassdoor, and other job boards that use CAPTCHAs with Captcha AI integration.

Scrape job listings from Indeed, Linked In, Glassdoor, and other job boards that use CAPTCHAs with Captcha AI...

Python reCAPTCHA v2 Cloudflare Turnstile
Feb 28, 2026
Explainers How Proxy Quality Affects CAPTCHA Solve Success Rate
Understand how proxy quality, IP reputation, and configuration affect CAPTCHA frequency and solve success rates with Captcha AI.

Understand how proxy quality, IP reputation, and configuration affect CAPTCHA frequency and solve success rate...

Python reCAPTCHA v2 Cloudflare Turnstile
Feb 06, 2026
Tutorials Handling Multiple CAPTCHAs on a Single Page
how to detect and solve multiple CAPTCHAs on a single web page using Captcha AI.

Learn how to detect and solve multiple CAPTCHAs on a single web page using Captcha AI. Covers multi-iframe ext...

Python reCAPTCHA v2 Cloudflare Turnstile
Apr 09, 2026
Integrations Selenium Wire + CaptchaAI: Request Interception for CAPTCHA Solving
Complete guide to using Selenium Wire for request interception, proxy routing, and automated CAPTCHA solving with Captcha AI in Python.

Complete guide to using Selenium Wire for request interception, proxy routing, and automated CAPTCHA solving w...

Python reCAPTCHA v2 Cloudflare Turnstile
Mar 13, 2026
Use Cases Shipping and Logistics Rate Scraping with CAPTCHA Solving
Scrape shipping rates, tracking data, and logistics information from carrier websites protected by CAPTCHAs using Captcha AI.

Scrape shipping rates, tracking data, and logistics information from carrier websites protected by CAPTCHAs us...

Python reCAPTCHA v2 Cloudflare Turnstile
Jan 25, 2026
Use Cases Legal Research Web Scraping with CAPTCHA Handling
Scrape legal databases, court records, and case law from portals protected by CAPTCHAs using Captcha AI for automated legal research.

Scrape legal databases, court records, and case law from portals protected by CAPTCHAs using Captcha AI for au...

Python reCAPTCHA v2 Web Scraping
Jan 17, 2026
Use Cases Multi-Step Workflow Automation with CaptchaAI
Manage workflows across multiple accounts on CAPTCHA-protected platforms — , action, and data collection at scale.

Manage workflows across multiple accounts on CAPTCHA-protected platforms — , action, and data collection at sc...

Automation Python reCAPTCHA v2
Apr 06, 2026
Integrations Puppeteer Stealth + CaptchaAI: Reliable Browser Automation
Standard Puppeteer gets detected immediately by anti-bot systems.

Standard Puppeteer gets detected immediately by anti-bot systems. `puppeteer-extra-plugin-stealth` patches the...

Automation reCAPTCHA v2 Cloudflare Turnstile
Apr 05, 2026
Use Cases Retail Site Data Collection with CAPTCHA Handling
Amazon uses image CAPTCHAs to block automated access.

Amazon uses image CAPTCHAs to block automated access. When you hit their anti-bot threshold, you'll see a page...

Web Scraping Image OCR
Apr 07, 2026
Use Cases Event Ticket Monitoring with CAPTCHA Handling
Build an event ticket availability monitor that handles CAPTCHAs using Captcha AI.

Build an event ticket availability monitor that handles CAPTCHAs using Captcha AI. Python workflow for checkin...

Automation Python reCAPTCHA v2
Jan 17, 2026
Use Cases Automated Form Submission with CAPTCHA Handling
Complete guide to automating web form submissions that include CAPTCHA challenges — re CAPTCHA, Turnstile, and image CAPTCHAs with Captcha AI.

Complete guide to automating web form submissions that include CAPTCHA challenges — re CAPTCHA, Turnstile, and...

Python reCAPTCHA v2 Cloudflare Turnstile
Mar 21, 2026