Use Cases

How to Handle CAPTCHA Challenges in Web Scraping Workflows

CAPTCHAs are the most common blocker in web scraping workflows. When a target site serves a reCAPTCHA, Cloudflare Turnstile, or image CAPTCHA, your scraper stops dead. CaptchaAI's API solves these challenges automatically so your scraper keeps running.

How CAPTCHA Blocking Works in Scraping

Websites trigger CAPTCHAs based on behavioral signals:

Signal Trigger
Request rate Too many requests from one IP
Missing cookies No session or preference cookies
Bot-like headers Missing Accept-Language, Referer
JavaScript fingerprint No JS execution or headless browser detected
IP reputation Datacenter or proxy IP flagged

When triggered, the site returns a CAPTCHA challenge instead of the page content. Your scraper needs to solve it and submit the token to proceed.

Requirements

Requirement Details
CaptchaAI API key From captchaai.com
Python 3.7+ or Node.js 16+ For code examples
requests / axios HTTP client library
Target site URL The page serving the CAPTCHA
CAPTCHA site key Extracted from the page source

Step 1: Identify the CAPTCHA Type

Before solving, identify what CAPTCHA the site uses. Check the page source:

reCAPTCHA v2:

<div class="g-recaptcha" data-sitekey="6Le-wvkS..."></div>

reCAPTCHA v3:

<script src="https://www.google.com/recaptcha/api.js?render=6Le-wvkS..."></script>

Cloudflare Turnstile:

<div class="cf-turnstile" data-sitekey="0x4AAAAA..."></div>

Each type requires a different method parameter when submitting to CaptchaAI.

Step 2: Extract the Site Key

Python (with requests + BeautifulSoup)

from bs4 import BeautifulSoup
import requests

page = requests.get("https://example.com/protected-page", headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})
soup = BeautifulSoup(page.text, "html.parser")

# reCAPTCHA v2
recaptcha_div = soup.find("div", class_="g-recaptcha")
if recaptcha_div:
    site_key = recaptcha_div["data-sitekey"]
    print(f"reCAPTCHA v2 site key: {site_key}")

Node.js (with cheerio)

const axios = require("axios");
const cheerio = require("cheerio");

const { data } = await axios.get("https://example.com/protected-page");
const $ = cheerio.load(data);

const siteKey = $(".g-recaptcha").attr("data-sitekey");
console.log("Site key:", siteKey);

Step 3: Submit the CAPTCHA to CaptchaAI

Python

import requests
import time

API_KEY = "YOUR_API_KEY"
SITE_KEY = "6Le-wvkS..."
PAGE_URL = "https://example.com/protected-page"

# Submit
resp = requests.get("https://ocr.captchaai.com/in.php", params={
    "key": API_KEY,
    "method": "userrecaptcha",
    "googlekey": SITE_KEY,
    "pageurl": PAGE_URL
})

if not resp.text.startswith("OK|"):
    raise Exception(f"Submit error: {resp.text}")

task_id = resp.text.split("|")[1]
print(f"Task submitted: {task_id}")

# Poll for result
while True:
    time.sleep(5)
    result = requests.get("https://ocr.captchaai.com/res.php", params={
        "key": API_KEY,
        "action": "get",
        "id": task_id
    })
    if result.text == "CAPCHA_NOT_READY":
        continue
    if result.text.startswith("OK|"):
        token = result.text.split("|")[1]
        print(f"Solved! Token: {token[:50]}...")
        break
    raise Exception(f"Solve error: {result.text}")

Node.js

const axios = require("axios");

const API_KEY = "YOUR_API_KEY";
const SITE_KEY = "6Le-wvkS...";
const PAGE_URL = "https://example.com/protected-page";

// Submit
const submitResp = await axios.get("https://ocr.captchaai.com/in.php", {
  params: {
    key: API_KEY,
    method: "userrecaptcha",
    googlekey: SITE_KEY,
    pageurl: PAGE_URL,
  },
});

const taskId = submitResp.data.split("|")[1];

// Poll
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

while (true) {
  await sleep(5000);
  const result = await axios.get("https://ocr.captchaai.com/res.php", {
    params: { key: API_KEY, action: "get", id: taskId },
  });
  if (result.data === "CAPCHA_NOT_READY") continue;
  if (result.data.startsWith("OK|")) {
    const token = result.data.split("|")[1];
    console.log("Token:", token.substring(0, 50));
    break;
  }
  throw new Error(`Error: ${result.data}`);
}

Step 4: Submit the Token to the Target Site

Once you have the token, submit it with the form data the site expects:

Python

# Submit the solved token with the form
form_data = {
    "g-recaptcha-response": token,
    "username": "user@example.com",
    "password": "password123"
}

response = requests.post(PAGE_URL, data=form_data, headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})

print(f"Status: {response.status_code}")

Step 5: Build a Reusable Scraper Function

Wrap the solve logic into a reusable function:

import requests
import time

API_KEY = "YOUR_API_KEY"

def solve_captcha(site_key, page_url, method="userrecaptcha"):
    resp = requests.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY,
        "method": method,
        "googlekey": site_key,
        "pageurl": page_url
    })
    if not resp.text.startswith("OK|"):
        raise Exception(resp.text)
    task_id = resp.text.split("|")[1]

    for _ in range(60):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id
        })
        if result.text == "CAPCHA_NOT_READY":
            continue
        if result.text.startswith("OK|"):
            return result.text.split("|")[1]
        raise Exception(result.text)
    raise TimeoutError("CAPTCHA solve timed out")

# Use in your scraper
def scrape_page(url, site_key):
    token = solve_captcha(site_key, url)
    response = requests.post(url, data={"g-recaptcha-response": token})
    return response.text

Troubleshooting

Error Cause Fix
ERROR_WRONG_USER_KEY Invalid API key Check your key at captchaai.com dashboard
ERROR_ZERO_BALANCE No funds Add balance to your account
ERROR_CAPTCHA_UNSOLVABLE Challenge couldn't be solved Verify the site key and URL are correct
CAPCHA_NOT_READY (loops forever) Slow solve or wrong parameters Increase timeout; verify site key matches the page
Token rejected by site Token expired or wrong site key Use token within 120 seconds; confirm site key

Best Practices

  1. Rotate user agents — Use realistic browser User-Agent strings
  2. Add delays — Space requests 2-5 seconds apart to avoid rate limits
  3. Use proxies — Rotate residential proxies to distribute requests
  4. Handle cookies — Maintain session cookies across requests
  5. Cache tokens — Some tokens work for multiple requests within their validity window

FAQ

Does this work with Cloudflare-protected sites?

Yes. Use method=turnstile for Turnstile CAPTCHAs or method=cloudflare_challenge for full Cloudflare challenge pages. See How to Bypass Cloudflare Turnstile.

Do I need a headless browser?

Not always. For simple form submissions with reCAPTCHA, plain HTTP requests work. For JavaScript-heavy sites, combine CaptchaAI with Selenium or Puppeteer.

How much does it cost to scrape 10,000 pages?

At CaptchaAI's rates, solving 10,000 reCAPTCHA v2 challenges costs approximately $10. Image CAPTCHAs are even cheaper.

Can I solve CAPTCHAs in parallel?

Yes. Submit multiple tasks simultaneously and poll for each result. See Solving Multiple CAPTCHAs in Parallel.

Discussions (0)

No comments yet.

Related Posts

Use Cases Multi-Step Workflow Automation with CaptchaAI
Manage workflows across multiple accounts on CAPTCHA-protected platforms — , action, and data collection at scale.

Manage workflows across multiple accounts on CAPTCHA-protected platforms — , action, and data collection at sc...

Automation Python reCAPTCHA v2
Apr 06, 2026
Integrations Puppeteer Stealth + CaptchaAI: Reliable Browser Automation
Standard Puppeteer gets detected immediately by anti-bot systems.

Standard Puppeteer gets detected immediately by anti-bot systems. `puppeteer-extra-plugin-stealth` patches the...

Automation reCAPTCHA v2 Cloudflare Turnstile
Apr 05, 2026
Reference CAPTCHA Token Injection Methods Reference
Complete reference for injecting solved CAPTCHA tokens into web pages.

Complete reference for injecting solved CAPTCHA tokens into web pages. Covers re CAPTCHA, Turnstile, and Cloud...

Automation Python reCAPTCHA v2
Apr 08, 2026
Troubleshooting Turnstile Token Invalid After Solving: Diagnosis and Fixes
Fix Cloudflare Turnstile tokens that come back invalid after solving with Captcha AI.

Fix Cloudflare Turnstile tokens that come back invalid after solving with Captcha AI. Covers token expiry, sit...

Python Cloudflare Turnstile Web Scraping
Apr 08, 2026
Tutorials Pytest Fixtures for CaptchaAI API Testing
Build reusable pytest fixtures to test CAPTCHA-solving workflows with Captcha AI.

Build reusable pytest fixtures to test CAPTCHA-solving workflows with Captcha AI. Covers mocking, live integra...

Automation Python reCAPTCHA v2
Apr 08, 2026
Reference Browser Session Persistence for CAPTCHA Workflows
Manage browser sessions, cookies, and storage across CAPTCHA-solving runs to reduce repeat challenges and maintain authenticated state.

Manage browser sessions, cookies, and storage across CAPTCHA-solving runs to reduce repeat challenges and main...

Automation Python reCAPTCHA v2
Feb 24, 2026
Integrations Browser Profile Isolation + CaptchaAI Integration
Browser profile isolation tools create distinct browser environments with unique fingerprints per session.

Browser profile isolation tools create distinct browser environments with unique fingerprints per session. Com...

Automation Python reCAPTCHA v2
Feb 21, 2026
Comparisons WebDriver vs Chrome DevTools Protocol for CAPTCHA Automation
Compare Web Driver and Chrome Dev Tools Protocol (CDP) for CAPTCHA automation — detection, performance, capabilities, and when to use each with Captcha AI.

Compare Web Driver and Chrome Dev Tools Protocol (CDP) for CAPTCHA automation — detection, performance, capabi...

Automation Python reCAPTCHA v2
Mar 27, 2026
Use Cases Retail Site Data Collection with CAPTCHA Handling
Amazon uses image CAPTCHAs to block automated access.

Amazon uses image CAPTCHAs to block automated access. When you hit their anti-bot threshold, you'll see a page...

Web Scraping Image OCR
Apr 07, 2026
Use Cases CAPTCHA Solving in Ticket Purchase Automation
How to handle CAPTCHAs on ticketing platforms Ticketmaster, AXS, and event sites using Captcha AI for automated purchasing workflows.

How to handle CAPTCHAs on ticketing platforms Ticketmaster, AXS, and event sites using Captcha AI for automate...

Automation Python reCAPTCHA v2
Feb 25, 2026
Use Cases Event Ticket Monitoring with CAPTCHA Handling
Build an event ticket availability monitor that handles CAPTCHAs using Captcha AI.

Build an event ticket availability monitor that handles CAPTCHAs using Captcha AI. Python workflow for checkin...

Automation Python reCAPTCHA v2
Jan 17, 2026