Use Cases

Retail Site Data Collection with CAPTCHA Handling

Amazon uses image CAPTCHAs to block automated access. When you hit their anti-bot threshold, you'll see a page asking you to type characters from a distorted image. CaptchaAI's OCR solving handles these automatically.

How Amazon's CAPTCHA Works

Amazon triggers CAPTCHAs based on:

Signal Description
Request volume Too many requests from one IP in a short window
Missing cookies No Amazon session cookies
Suspicious headers Bot-like User-Agent or missing headers
IP reputation Known datacenter or proxy IP ranges

When triggered, Amazon redirects to a page with a distorted text image and an input field. You must solve the image and submit the text to continue.

Requirements

Requirement Details
CaptchaAI API key From captchaai.com
Python 3.7+ With requests and beautifulsoup4
Residential proxies Recommended for sustained scraping

Solving Amazon's Image CAPTCHA

Step 1: Detect the CAPTCHA Page

import requests
from bs4 import BeautifulSoup

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
})

def is_captcha_page(html):
    return "Type the characters you see in this image" in html or \
           "captcha" in html.lower()

url = "https://www.amazon.com/dp/B0EXAMPLE"
resp = session.get(url)

if is_captcha_page(resp.text):
    print("CAPTCHA detected!")
else:
    print("Page loaded successfully")

Step 2: Extract and Solve the Image

import base64

API_KEY = "YOUR_API_KEY"

def solve_amazon_captcha(session, captcha_page_html, captcha_page_url):
    soup = BeautifulSoup(captcha_page_html, "html.parser")

    # Find the CAPTCHA image
    img_tag = soup.find("img", src=lambda s: s and "captcha" in s.lower())
    if not img_tag:
        raise Exception("CAPTCHA image not found")

    img_url = img_tag["src"]

    # Download the image
    img_resp = session.get(img_url)
    img_base64 = base64.b64encode(img_resp.content).decode()

    # Submit to CaptchaAI
    submit_resp = requests.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY,
        "method": "base64",
        "body": img_base64
    })
    task_id = submit_resp.text.split("|")[1]

    # Poll for result
    import time
    for _ in range(30):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id
        })
        if result.text == "CAPCHA_NOT_READY":
            continue
        if result.text.startswith("OK|"):
            return result.text.split("|")[1]
        raise Exception(f"Solve error: {result.text}")

    raise TimeoutError("Solve timed out")

Step 3: Submit the Solution

def submit_captcha_solution(session, captcha_page_html, solution, captcha_page_url):
    soup = BeautifulSoup(captcha_page_html, "html.parser")
    form = soup.find("form")

    # Build form data
    form_data = {}
    for inp in form.find_all("input"):
        name = inp.get("name")
        if name:
            form_data[name] = inp.get("value", "")

    # Set the CAPTCHA answer
    form_data["field-keywords"] = solution

    # Submit
    action = form.get("action", captcha_page_url)
    if action.startswith("/"):
        from urllib.parse import urljoin
        action = urljoin(captcha_page_url, action)

    resp = session.post(action, data=form_data)
    return resp

Full Working Example

import requests
import base64
import time
from bs4 import BeautifulSoup

API_KEY = "YOUR_API_KEY"

def scrape_amazon_product(url):
    session = requests.Session()
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept-Language": "en-US,en;q=0.9"
    })

    resp = session.get(url)

    # Handle CAPTCHA if present
    if "captcha" in resp.text.lower():
        soup = BeautifulSoup(resp.text, "html.parser")
        img = soup.find("img", src=lambda s: s and "captcha" in s.lower())

        if img:
            # Download and solve
            img_data = session.get(img["src"]).content
            img_b64 = base64.b64encode(img_data).decode()

            submit = requests.get("https://ocr.captchaai.com/in.php", params={
                "key": API_KEY, "method": "base64", "body": img_b64
            })
            task_id = submit.text.split("|")[1]

            for _ in range(30):
                time.sleep(5)
                result = requests.get("https://ocr.captchaai.com/res.php", params={
                    "key": API_KEY, "action": "get", "id": task_id
                })
                if result.text == "CAPCHA_NOT_READY":
                    continue
                if result.text.startswith("OK|"):
                    solution = result.text.split("|")[1]
                    break

            # Submit solution
            form = soup.find("form")
            form_data = {inp.get("name"): inp.get("value", "")
                        for inp in form.find_all("input") if inp.get("name")}
            form_data["field-keywords"] = solution

            action = form.get("action", url)
            resp = session.post(action, data=form_data)

    # Parse product data
    soup = BeautifulSoup(resp.text, "html.parser")
    title = soup.find("span", {"id": "productTitle"})
    price = soup.find("span", class_="a-price-whole")

    return {
        "title": title.text.strip() if title else None,
        "price": price.text.strip() if price else None
    }

product = scrape_amazon_product("https://www.amazon.com/dp/B0EXAMPLE")
print(product)

Best Practices for Amazon Scraping

  1. Use residential proxies — Amazon blocks datacenter IPs aggressively
  2. Rotate User-Agents — Use a pool of realistic browser strings
  3. Maintain sessions — Keep cookies across requests
  4. Add delays — 3-10 seconds between requests
  5. Set Accept-Language — Always include locale headers
  6. Don't scrape logged-in pages — Product pages are accessible without login

Troubleshooting

Issue Fix
CAPTCHA on every request Use residential proxies; slow down request rate
CAPTCHA solution rejected Verify image was downloaded correctly; retry
Redirect loops Check cookie handling; use allow_redirects=True
Empty product data Amazon may serve different layouts; check selectors

FAQ

Does Amazon use reCAPTCHA?

Amazon primarily uses its own image-based CAPTCHA (distorted text). CaptchaAI solves these using the method=base64 endpoint for image/OCR solving.

How many requests before Amazon shows a CAPTCHA?

It varies. With good proxies and realistic headers, you may scrape hundreds of pages. Without proxies, CAPTCHAs can appear after 10-20 requests.

Scraping publicly available product data is generally legal, but check Amazon's terms of service and applicable laws in your jurisdiction.

Discussions (0)

No comments yet.

Related Posts

API Tutorials Case-Sensitive CAPTCHA API Parameter Guide
How to use the regsense parameter for case-sensitive CAPTCHA solving with Captcha AI.

How to use the regsense parameter for case-sensitive CAPTCHA solving with Captcha AI. Covers when to use, comm...

Python Web Scraping Image OCR
Apr 09, 2026
API Tutorials Image CAPTCHA Base64 Encoding Best Practices
Best practices for base 64 encoding CAPTCHA images before submitting to Captcha AI.

Best practices for base 64 encoding CAPTCHA images before submitting to Captcha AI. Covers format, quality, si...

Python Web Scraping Image OCR
Apr 06, 2026
API Tutorials Custom CAPTCHA Types: Submitting Unusual Challenges to CaptchaAI
How to submit non-standard and custom CAPTCHA types to Captcha AI — drag-and-drop, slider, puzzle, audio, and custom interactive challenges.

How to submit non-standard and custom CAPTCHA types to Captcha AI — drag-and-drop, slider, puzzle, audio, and...

Python Web Scraping Image OCR
Feb 07, 2026
Tutorials Grid Image CAPTCHA: Coordinate Mapping and Cell Selection
Map grid image CAPTCHA cells to coordinates, extract the full grid, and solve re CAPTCHA-style image challenges with Captcha AI.

Map grid image CAPTCHA cells to coordinates, extract the full grid, and solve re CAPTCHA-style image challenge...

Python Web Scraping Image OCR
Jan 20, 2026
API Tutorials Improving OCR CAPTCHA Accuracy with CaptchaAI Settings
Optimize OCR CAPTCHA solve accuracy using Captcha AI API parameters — numeric, min Len, max Len, language, regsense, and textinstructions.

Optimize OCR CAPTCHA solve accuracy using Captcha AI API parameters — numeric, min Len, max Len, language, reg...

Python Web Scraping Image OCR
Jan 09, 2026
API Tutorials Phrase, MinLen, and MaxLen Parameters for Image CAPTCHA
Use phrase, min Len, and max Len parameters to constrain image CAPTCHA solving with Captcha AI and improve accuracy.

Use phrase, min Len, and max Len parameters to constrain image CAPTCHA solving with Captcha AI and improve acc...

Python Web Scraping Image OCR
Jan 09, 2026
API Tutorials Math CAPTCHA Solving with CaptchaAI calc Parameter
Solve math CAPTCHAs using Captcha AI's calc parameter.

Solve math CAPTCHAs using Captcha AI's calc parameter. The API reads the equation and returns the computed res...

Python Web Scraping Image OCR
Apr 02, 2026
API Tutorials CAPTCHA Image Preprocessing for Better Solve Rates
Preprocess CAPTCHA images using Python PIL to improve solve rates — grayscale conversion, noise removal, contrast enhancement, and binarization.

Preprocess CAPTCHA images using Python PIL to improve solve rates — grayscale conversion, noise removal, contr...

Python Web Scraping Image OCR
Mar 15, 2026
Use Cases Shipping and Logistics Rate Scraping with CAPTCHA Solving
Scrape shipping rates, tracking data, and logistics information from carrier websites protected by CAPTCHAs using Captcha AI.

Scrape shipping rates, tracking data, and logistics information from carrier websites protected by CAPTCHAs us...

Python Cloudflare Turnstile reCAPTCHA v2
Jan 25, 2026
API Tutorials Multi-Character Image CAPTCHA Solving Strategies
Strategies for solving multi-character image CAPTCHAs — handling connected letters, overlapping characters, and distorted text with Captcha AI.

Strategies for solving multi-character image CAPTCHAs — handling connected letters, overlapping characters, an...

Python Web Scraping Image OCR
Jan 20, 2026
Use Cases Academic Research Web Scraping with CAPTCHA Solving
How researchers can collect data from academic databases, journals, and citation sources protected by CAPTCHAs using Captcha AI.

How researchers can collect data from academic databases, journals, and citation sources protected by CAPTCHAs...

Python Cloudflare Turnstile reCAPTCHA v2
Apr 06, 2026
Use Cases Multi-Step Workflow Automation with CaptchaAI
Manage workflows across multiple accounts on CAPTCHA-protected platforms — , action, and data collection at scale.

Manage workflows across multiple accounts on CAPTCHA-protected platforms — , action, and data collection at sc...

Python Automation Cloudflare Turnstile
Apr 06, 2026
Use Cases Cyrillic Text CAPTCHA Solving with CaptchaAI
Solve Cyrillic text CAPTCHAs on Russian, Ukrainian, and other Slavic-language websites — handle character recognition, confusable glyphs, and encoding for Cyril...

Solve Cyrillic text CAPTCHAs on Russian, Ukrainian, and other Slavic-language websites — handle character reco...

Python Image OCR
Mar 28, 2026