Use Cases

Search Results Data Collection with CAPTCHA Handling

Google uses reCAPTCHA to protect its search results and other services from automated access. When triggered, you'll see a reCAPTCHA v2 or v3 challenge that blocks further requests. CaptchaAI solves these challenges so your scraper can continue.

How Google Detects Scrapers

Signal Description
Query rate Too many searches from one IP
IP reputation Datacenter or flagged proxy IPs
Cookie absence No Google session cookies
Behavioral patterns Identical query patterns, no dwell time
JavaScript fingerprint Missing browser environment indicators

Google typically serves a 429 Too Many Requests response or redirects to a reCAPTCHA challenge page at google.com/sorry/.

Requirements

Requirement Details
CaptchaAI API key From captchaai.com
Python 3.7+ With requests
Residential proxies Strongly recommended

Solving Google's reCAPTCHA

Step 1: Detect the CAPTCHA

import requests

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept-Language": "en-US,en;q=0.9"
})

resp = session.get("https://www.google.com/search?q=example")

if "sorry" in resp.url or resp.status_code == 429:
    print("CAPTCHA triggered!")
    captcha_url = resp.url
else:
    print("Results loaded")

Step 2: Extract the Site Key

from bs4 import BeautifulSoup

soup = BeautifulSoup(resp.text, "html.parser")

# Google uses data-sitekey on the reCAPTCHA div
recaptcha = soup.find("div", {"data-sitekey": True})
if recaptcha:
    site_key = recaptcha["data-sitekey"]
    print(f"Site key: {site_key}")

Step 3: Solve with CaptchaAI

import time

API_KEY = "YOUR_API_KEY"

def solve_google_recaptcha(site_key, page_url):
    resp = requests.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY,
        "method": "userrecaptcha",
        "googlekey": site_key,
        "pageurl": page_url
    })
    if not resp.text.startswith("OK|"):
        raise Exception(f"Submit error: {resp.text}")

    task_id = resp.text.split("|")[1]

    for _ in range(60):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id
        })
        if result.text == "CAPCHA_NOT_READY":
            continue
        if result.text.startswith("OK|"):
            return result.text.split("|")[1]
        raise Exception(f"Error: {result.text}")

    raise TimeoutError("Timed out")

token = solve_google_recaptcha(site_key, captcha_url)

Step 4: Submit the Token

# Find the form action URL and hidden fields
form = soup.find("form")
form_data = {}
for inp in form.find_all("input", {"name": True}):
    form_data[inp["name"]] = inp.get("value", "")

form_data["g-recaptcha-response"] = token

action = form.get("action", "")
if action.startswith("/"):
    action = f"https://www.google.com{action}"

result = session.post(action, data=form_data)
print(f"Redirected to: {result.url}")

Complete Scraper with CAPTCHA Handling

import requests
import time
from bs4 import BeautifulSoup

API_KEY = "YOUR_API_KEY"

def solve_captcha(site_key, page_url):
    resp = requests.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY, "method": "userrecaptcha",
        "googlekey": site_key, "pageurl": page_url
    })
    task_id = resp.text.split("|")[1]
    for _ in range(60):
        time.sleep(5)
        r = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id
        })
        if r.text == "CAPCHA_NOT_READY": continue
        if r.text.startswith("OK|"): return r.text.split("|")[1]
    raise TimeoutError()

def google_search(query, num_results=10):
    session = requests.Session()
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept-Language": "en-US,en;q=0.9"
    })

    resp = session.get("https://www.google.com/search", params={
        "q": query, "num": num_results
    })

    # Handle CAPTCHA
    if "sorry" in resp.url or resp.status_code == 429:
        soup = BeautifulSoup(resp.text, "html.parser")
        rc = soup.find("div", {"data-sitekey": True})
        if rc:
            token = solve_captcha(rc["data-sitekey"], resp.url)
            form = soup.find("form")
            data = {i["name"]: i.get("value", "")
                    for i in form.find_all("input", {"name": True})}
            data["g-recaptcha-response"] = token
            action = form.get("action", resp.url)
            if action.startswith("/"):
                action = f"https://www.google.com{action}"
            resp = session.post(action, data=data)

    # Parse results
    soup = BeautifulSoup(resp.text, "html.parser")
    results = []
    for div in soup.find_all("div", class_="g"):
        link = div.find("a")
        title = div.find("h3")
        if link and title:
            results.append({
                "title": title.text,
                "url": link.get("href")
            })

    return results

results = google_search("best captcha solving api")
for r in results:
    print(f"{r['title']}: {r['url']}")

Best Practices

  1. Use residential proxies — Google blocks datacenter IPs immediately
  2. Randomize query timing — Wait 5-15 seconds between searches
  3. Vary User-Agents — Rotate through realistic browser User-Agent strings
  4. Limit volume — Keep queries under 100/hour per IP
  5. Use localized domains — Match your proxy region to the Google domain

Troubleshooting

Issue Fix
CAPTCHA on every request Switch to residential proxies; reduce rate
reCAPTCHA site key not found Google may have changed the challenge page layout
Token accepted but still blocked Google may require additional verification; try different proxy
Results page is empty Check if Google served an alternate layout

FAQ

Does Google always use reCAPTCHA?

Google primarily uses reCAPTCHA v2 on its challenge pages. Some Google services may use reCAPTCHA v3 in the background. CaptchaAI handles both versions.

How many searches can I make before hitting a CAPTCHA?

It depends on your IP quality and request pattern. With residential proxies and delays, you can often make 50-100 searches before triggering. Without proxies, expect CAPTCHAs after 5-10 searches.

Should I use Google's API instead?

Google's Custom Search JSON API allows 100 free queries/day and 10,000 at $5/1,000. If your volume is low and you only need search results, the official API may be simpler. Scraping is necessary for data Google doesn't expose via API.

Discussions (0)

No comments yet.

Related Posts

Tutorials Extracting reCAPTCHA Parameters from Page Source
Extract re CAPTCHA parameters from any web page — sitekey, action, data-s, enterprise flag, and version — using regex, DOM queries, and network interception.

Extract all re CAPTCHA parameters from any web page — sitekey, action, data-s, enterprise flag, and version —...

Python reCAPTCHA v2 Web Scraping
Apr 07, 2026
Use Cases Job Board Scraping with CAPTCHA Handling Using CaptchaAI
Scrape job listings from Indeed, Linked In, Glassdoor, and other job boards that use CAPTCHAs with Captcha AI integration.

Scrape job listings from Indeed, Linked In, Glassdoor, and other job boards that use CAPTCHAs with Captcha AI...

Python reCAPTCHA v2 Cloudflare Turnstile
Feb 28, 2026
Explainers How Proxy Quality Affects CAPTCHA Solve Success Rate
Understand how proxy quality, IP reputation, and configuration affect CAPTCHA frequency and solve success rates with Captcha AI.

Understand how proxy quality, IP reputation, and configuration affect CAPTCHA frequency and solve success rate...

Python reCAPTCHA v2 Cloudflare Turnstile
Feb 06, 2026
Integrations Puppeteer Stealth + CaptchaAI: Reliable Browser Automation
Standard Puppeteer gets detected immediately by anti-bot systems.

Standard Puppeteer gets detected immediately by anti-bot systems. `puppeteer-extra-plugin-stealth` patches the...

Automation reCAPTCHA v2 Cloudflare Turnstile
Apr 05, 2026
Use Cases Academic Research Web Scraping with CAPTCHA Solving
How researchers can collect data from academic databases, journals, and citation sources protected by CAPTCHAs using Captcha AI.

How researchers can collect data from academic databases, journals, and citation sources protected by CAPTCHAs...

Python reCAPTCHA v2 Cloudflare Turnstile
Apr 06, 2026
Explainers Mobile Proxies for CAPTCHA Solving: Higher Success Rates Explained
Why mobile proxies produce the lowest CAPTCHA trigger rates and how to use them with Captcha AI for maximum success.

Why mobile proxies produce the lowest CAPTCHA trigger rates and how to use them with Captcha AI for maximum su...

Python reCAPTCHA v2 Cloudflare Turnstile
Apr 03, 2026
Explainers Rotating Residential Proxies: Best Practices for CAPTCHA Solving
Best practices for using rotating residential proxies with Captcha AI to reduce CAPTCHA frequency and maintain high solve rates.

Best practices for using rotating residential proxies with Captcha AI to reduce CAPTCHA frequency and maintain...

Python reCAPTCHA v2 Cloudflare Turnstile
Mar 01, 2026
Use Cases CAPTCHA Scraping with Python: Complete Guide
how to handle CAPTCHAs in Python scraping scripts using requests, Beautiful Soup, and Captcha AI's API.

Learn how to handle CAPTCHAs in Python scraping scripts using requests, Beautiful Soup, and Captcha AI's API.

Python Web Scraping reCAPTCHA v3
Feb 07, 2026
Reference CAPTCHA Token Injection Methods Reference
Complete reference for injecting solved CAPTCHA tokens into web pages.

Complete reference for injecting solved CAPTCHA tokens into web pages. Covers re CAPTCHA, Turnstile, and Cloud...

Automation Python reCAPTCHA v2
Apr 08, 2026
Troubleshooting Turnstile Token Invalid After Solving: Diagnosis and Fixes
Fix Cloudflare Turnstile tokens that come back invalid after solving with Captcha AI.

Fix Cloudflare Turnstile tokens that come back invalid after solving with Captcha AI. Covers token expiry, sit...

Python Cloudflare Turnstile Web Scraping
Apr 08, 2026
Use Cases Retail Site Data Collection with CAPTCHA Handling
Amazon uses image CAPTCHAs to block automated access.

Amazon uses image CAPTCHAs to block automated access. When you hit their anti-bot threshold, you'll see a page...

Web Scraping Image OCR
Apr 07, 2026
Use Cases Event Ticket Monitoring with CAPTCHA Handling
Build an event ticket availability monitor that handles CAPTCHAs using Captcha AI.

Build an event ticket availability monitor that handles CAPTCHAs using Captcha AI. Python workflow for checkin...

Automation Python reCAPTCHA v2
Jan 17, 2026
Use Cases Automated Form Submission with CAPTCHA Handling
Complete guide to automating web form submissions that include CAPTCHA challenges — re CAPTCHA, Turnstile, and image CAPTCHAs with Captcha AI.

Complete guide to automating web form submissions that include CAPTCHA challenges — re CAPTCHA, Turnstile, and...

Python reCAPTCHA v2 Cloudflare Turnstile
Mar 21, 2026