CAPTCHA Scraping with Python: Complete Guide

Python's requests library handles HTTP efficiently, but CAPTCHAs require an external solver. This guide shows how to integrate CaptchaAI into Python scraping scripts — no browser needed for most sites.

Requirements

Requirement	Details
Python 3.7+	With pip
requests	`pip install requests`
beautifulsoup4	`pip install beautifulsoup4`
CaptchaAI API key	From captchaai.com

The CaptchaAI Helper Class

Build a reusable solver class for your Python projects:

import requests
import time

class CaptchaSolver:
    def __init__(self, api_key):
        self.api_key = api_key
        self.base = "https://ocr.captchaai.com"

    def _submit(self, params):
        params["key"] = self.api_key
        resp = requests.get(f"{self.base}/in.php", params=params)
        if not resp.text.startswith("OK|"):
            raise Exception(f"Submit error: {resp.text}")
        return resp.text.split("|")[1]

    def _poll(self, task_id, timeout=300):
        deadline = time.time() + timeout
        while time.time() < deadline:
            time.sleep(5)
            resp = requests.get(f"{self.base}/res.php", params={
                "key": self.api_key,
                "action": "get",
                "id": task_id
            })
            if resp.text == "CAPCHA_NOT_READY":
                continue
            if resp.text.startswith("OK|"):
                return resp.text.split("|")[1]
            raise Exception(f"Solve error: {resp.text}")
        raise TimeoutError("Solve timed out")

    def solve_recaptcha_v2(self, site_key, page_url):
        task_id = self._submit({
            "method": "userrecaptcha",
            "googlekey": site_key,
            "pageurl": page_url
        })
        return self._poll(task_id)

    def solve_recaptcha_v3(self, site_key, page_url, action="verify"):
        task_id = self._submit({
            "method": "userrecaptcha",
            "googlekey": site_key,
            "pageurl": page_url,
            "version": "v3",
            "action": action
        })
        return self._poll(task_id)

    def solve_turnstile(self, site_key, page_url):
        task_id = self._submit({
            "method": "turnstile",
            "sitekey": site_key,
            "pageurl": page_url
        })
        return self._poll(task_id)

    def solve_image(self, image_base64):
        task_id = self._submit({
            "method": "base64",
            "body": image_base64
        })
        return self._poll(task_id)

Scraping a reCAPTCHA-Protected Form

from bs4 import BeautifulSoup
import requests

solver = CaptchaSolver("YOUR_API_KEY")
session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})

# Step 1: Load the page
url = "https://example.com/search"
page = session.get(url)
soup = BeautifulSoup(page.text, "html.parser")

# Step 2: Extract the site key
recaptcha_div = soup.find("div", class_="g-recaptcha")
site_key = recaptcha_div["data-sitekey"]

# Step 3: Solve the CAPTCHA
token = solver.solve_recaptcha_v2(site_key, url)

# Step 4: Submit the form with the token
form_data = {
    "q": "search term",
    "g-recaptcha-response": token
}
result = session.post(url, data=form_data)

# Step 5: Parse the results
result_soup = BeautifulSoup(result.text, "html.parser")
items = result_soup.find_all("div", class_="result-item")
for item in items:
    print(item.text.strip())

Scraping Multiple Pages

For paginated results behind CAPTCHAs:

def scrape_all_pages(base_url, site_key, max_pages=10):
    solver = CaptchaSolver("YOUR_API_KEY")
    session = requests.Session()
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    })
    all_results = []

    for page_num in range(1, max_pages + 1):
        page_url = f"{base_url}?page={page_num}"

        # Solve CAPTCHA for each page if needed
        token = solver.solve_recaptcha_v2(site_key, page_url)

        resp = session.get(page_url, params={
            "g-recaptcha-response": token,
            "page": page_num
        })

        soup = BeautifulSoup(resp.text, "html.parser")
        items = soup.find_all("div", class_="item")

        if not items:
            break

        all_results.extend([item.text.strip() for item in items])
        print(f"Page {page_num}: {len(items)} items")

        time.sleep(2)  # Polite delay

    return all_results

Handling Image CAPTCHAs

For sites with image-based text CAPTCHAs:

import base64

def scrape_with_image_captcha(url):
    solver = CaptchaSolver("YOUR_API_KEY")
    session = requests.Session()

    page = session.get(url)
    soup = BeautifulSoup(page.text, "html.parser")

    # Find the CAPTCHA image
    captcha_img = soup.find("img", {"id": "captcha-image"})
    captcha_url = captcha_img["src"]

    # Download and encode the image
    img_resp = session.get(captcha_url)
    img_base64 = base64.b64encode(img_resp.content).decode()

    # Solve
    captcha_text = solver.solve_image(img_base64)

    # Submit
    form_data = {
        "captcha": captcha_text,
        "username": "user"
    }
    result = session.post(url, data=form_data)
    return result.text

Error Handling and Retries

Add retry logic for production scrapers:

def solve_with_retry(solver, site_key, page_url, max_retries=3):
    for attempt in range(max_retries):
        try:
            return solver.solve_recaptcha_v2(site_key, page_url)
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            print(f"Attempt {attempt + 1} failed: {e}. Retrying...")
            time.sleep(2)

Troubleshooting

Issue	Cause	Fix
`ERROR_WRONG_USER_KEY`	Invalid API key	Verify key from dashboard
`ERROR_ZERO_BALANCE`	No funds	Top up your account
Form submission returns CAPTCHA page again	Token expired or wrong field name	Use token immediately; check form field names
`ConnectionError`	Network issue	Add retry logic with exponential backoff
Empty results after submission	Site requires cookies/session	Use `requests.Session()` to maintain cookies

FAQ

Do I need Selenium for CAPTCHA scraping in Python?

Not always. If the site's form works with standard HTTP POST requests, requests + CaptchaAI is faster and lighter than Selenium. Use Selenium only when the site requires JavaScript rendering.

Can I solve CAPTCHAs asynchronously?

Yes. Use aiohttp with CaptchaAI's API for async workflows. See aiohttp + CaptchaAI Integration.

How do I handle rate limiting?

Add delays between requests (time.sleep(2-5)), rotate proxies, and use realistic headers. See Proxy Rotation for CAPTCHA Scraping.

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

CAPTCHA Scraping with Python: Complete Guide

Requirements

The CaptchaAI Helper Class

Scraping a reCAPTCHA-Protected Form

Scraping Multiple Pages

Handling Image CAPTCHAs

Error Handling and Retries

Troubleshooting

FAQ

Do I need Selenium for CAPTCHA scraping in Python?

Can I solve CAPTCHAs asynchronously?

How do I handle rate limiting?

Discussions (0)

Rotating Residential Proxies: Best Practices for CAPTCHA Solving

Mobile Proxies for CAPTCHA Solving: Higher Success Rates Explained

Extracting reCAPTCHA Parameters from Page Source

Academic Research Web Scraping with CAPTCHA Solving

How Proxy Quality Affects CAPTCHA Solve Success Rate

Job Board Scraping with CAPTCHA Handling Using CaptchaAI

Requirements

The CaptchaAI Helper Class

Scraping a reCAPTCHA-Protected Form

Scraping Multiple Pages

Handling Image CAPTCHAs

Error Handling and Retries

Troubleshooting

FAQ

Do I need Selenium for CAPTCHA scraping in Python?

Can I solve CAPTCHAs asynchronously?

How do I handle rate limiting?

Related Guides

Discussions (0)

Join the conversation

Related Posts

Rotating Residential Proxies: Best Practices for CAPTCHA Solving

Mobile Proxies for CAPTCHA Solving: Higher Success Rates Explained

Extracting reCAPTCHA Parameters from Page Source

Academic Research Web Scraping with CAPTCHA Solving

How Proxy Quality Affects CAPTCHA Solve Success Rate

Job Board Scraping with CAPTCHA Handling Using CaptchaAI