Google uses reCAPTCHA to protect its search results and other services from automated access. When triggered, you'll see a reCAPTCHA v2 or v3 challenge that blocks further requests. CaptchaAI solves these challenges so your scraper can continue.
How Google Detects Scrapers
| Signal | Description |
|---|---|
| Query rate | Too many searches from one IP |
| IP reputation | Datacenter or flagged proxy IPs |
| Cookie absence | No Google session cookies |
| Behavioral patterns | Identical query patterns, no dwell time |
| JavaScript fingerprint | Missing browser environment indicators |
Google typically serves a 429 Too Many Requests response or redirects to a reCAPTCHA challenge page at google.com/sorry/.
Requirements
| Requirement | Details |
|---|---|
| CaptchaAI API key | From captchaai.com |
| Python 3.7+ | With requests |
| Residential proxies | Strongly recommended |
Solving Google's reCAPTCHA
Step 1: Detect the CAPTCHA
import requests
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9"
})
resp = session.get("https://www.google.com/search?q=example")
if "sorry" in resp.url or resp.status_code == 429:
print("CAPTCHA triggered!")
captcha_url = resp.url
else:
print("Results loaded")
Step 2: Extract the Site Key
from bs4 import BeautifulSoup
soup = BeautifulSoup(resp.text, "html.parser")
# Google uses data-sitekey on the reCAPTCHA div
recaptcha = soup.find("div", {"data-sitekey": True})
if recaptcha:
site_key = recaptcha["data-sitekey"]
print(f"Site key: {site_key}")
Step 3: Solve with CaptchaAI
import time
API_KEY = "YOUR_API_KEY"
def solve_google_recaptcha(site_key, page_url):
resp = requests.get("https://ocr.captchaai.com/in.php", params={
"key": API_KEY,
"method": "userrecaptcha",
"googlekey": site_key,
"pageurl": page_url
})
if not resp.text.startswith("OK|"):
raise Exception(f"Submit error: {resp.text}")
task_id = resp.text.split("|")[1]
for _ in range(60):
time.sleep(5)
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get", "id": task_id
})
if result.text == "CAPCHA_NOT_READY":
continue
if result.text.startswith("OK|"):
return result.text.split("|")[1]
raise Exception(f"Error: {result.text}")
raise TimeoutError("Timed out")
token = solve_google_recaptcha(site_key, captcha_url)
Step 4: Submit the Token
# Find the form action URL and hidden fields
form = soup.find("form")
form_data = {}
for inp in form.find_all("input", {"name": True}):
form_data[inp["name"]] = inp.get("value", "")
form_data["g-recaptcha-response"] = token
action = form.get("action", "")
if action.startswith("/"):
action = f"https://www.google.com{action}"
result = session.post(action, data=form_data)
print(f"Redirected to: {result.url}")
Complete Scraper with CAPTCHA Handling
import requests
import time
from bs4 import BeautifulSoup
API_KEY = "YOUR_API_KEY"
def solve_captcha(site_key, page_url):
resp = requests.get("https://ocr.captchaai.com/in.php", params={
"key": API_KEY, "method": "userrecaptcha",
"googlekey": site_key, "pageurl": page_url
})
task_id = resp.text.split("|")[1]
for _ in range(60):
time.sleep(5)
r = requests.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get", "id": task_id
})
if r.text == "CAPCHA_NOT_READY": continue
if r.text.startswith("OK|"): return r.text.split("|")[1]
raise TimeoutError()
def google_search(query, num_results=10):
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9"
})
resp = session.get("https://www.google.com/search", params={
"q": query, "num": num_results
})
# Handle CAPTCHA
if "sorry" in resp.url or resp.status_code == 429:
soup = BeautifulSoup(resp.text, "html.parser")
rc = soup.find("div", {"data-sitekey": True})
if rc:
token = solve_captcha(rc["data-sitekey"], resp.url)
form = soup.find("form")
data = {i["name"]: i.get("value", "")
for i in form.find_all("input", {"name": True})}
data["g-recaptcha-response"] = token
action = form.get("action", resp.url)
if action.startswith("/"):
action = f"https://www.google.com{action}"
resp = session.post(action, data=data)
# Parse results
soup = BeautifulSoup(resp.text, "html.parser")
results = []
for div in soup.find_all("div", class_="g"):
link = div.find("a")
title = div.find("h3")
if link and title:
results.append({
"title": title.text,
"url": link.get("href")
})
return results
results = google_search("best captcha solving api")
for r in results:
print(f"{r['title']}: {r['url']}")
Best Practices
- Use residential proxies — Google blocks datacenter IPs immediately
- Randomize query timing — Wait 5-15 seconds between searches
- Vary User-Agents — Rotate through realistic browser User-Agent strings
- Limit volume — Keep queries under 100/hour per IP
- Use localized domains — Match your proxy region to the Google domain
Troubleshooting
| Issue | Fix |
|---|---|
| CAPTCHA on every request | Switch to residential proxies; reduce rate |
| reCAPTCHA site key not found | Google may have changed the challenge page layout |
| Token accepted but still blocked | Google may require additional verification; try different proxy |
| Results page is empty | Check if Google served an alternate layout |
FAQ
Does Google always use reCAPTCHA?
Google primarily uses reCAPTCHA v2 on its challenge pages. Some Google services may use reCAPTCHA v3 in the background. CaptchaAI handles both versions.
How many searches can I make before hitting a CAPTCHA?
It depends on your IP quality and request pattern. With residential proxies and delays, you can often make 50-100 searches before triggering. Without proxies, expect CAPTCHAs after 5-10 searches.
Should I use Google's API instead?
Google's Custom Search JSON API allows 100 free queries/day and 10,000 at $5/1,000. If your volume is low and you only need search results, the official API may be simpler. Scraping is necessary for data Google doesn't expose via API.
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.