Amazon uses image CAPTCHAs to block automated access. When you hit their anti-bot threshold, you'll see a page asking you to type characters from a distorted image. CaptchaAI's OCR solving handles these automatically.
How Amazon's CAPTCHA Works
Amazon triggers CAPTCHAs based on:
| Signal | Description |
|---|---|
| Request volume | Too many requests from one IP in a short window |
| Missing cookies | No Amazon session cookies |
| Suspicious headers | Bot-like User-Agent or missing headers |
| IP reputation | Known datacenter or proxy IP ranges |
When triggered, Amazon redirects to a page with a distorted text image and an input field. You must solve the image and submit the text to continue.
Requirements
| Requirement | Details |
|---|---|
| CaptchaAI API key | From captchaai.com |
| Python 3.7+ | With requests and beautifulsoup4 |
| Residential proxies | Recommended for sustained scraping |
Solving Amazon's Image CAPTCHA
Step 1: Detect the CAPTCHA Page
import requests
from bs4 import BeautifulSoup
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
})
def is_captcha_page(html):
return "Type the characters you see in this image" in html or \
"captcha" in html.lower()
url = "https://www.amazon.com/dp/B0EXAMPLE"
resp = session.get(url)
if is_captcha_page(resp.text):
print("CAPTCHA detected!")
else:
print("Page loaded successfully")
Step 2: Extract and Solve the Image
import base64
API_KEY = "YOUR_API_KEY"
def solve_amazon_captcha(session, captcha_page_html, captcha_page_url):
soup = BeautifulSoup(captcha_page_html, "html.parser")
# Find the CAPTCHA image
img_tag = soup.find("img", src=lambda s: s and "captcha" in s.lower())
if not img_tag:
raise Exception("CAPTCHA image not found")
img_url = img_tag["src"]
# Download the image
img_resp = session.get(img_url)
img_base64 = base64.b64encode(img_resp.content).decode()
# Submit to CaptchaAI
submit_resp = requests.get("https://ocr.captchaai.com/in.php", params={
"key": API_KEY,
"method": "base64",
"body": img_base64
})
task_id = submit_resp.text.split("|")[1]
# Poll for result
import time
for _ in range(30):
time.sleep(5)
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get", "id": task_id
})
if result.text == "CAPCHA_NOT_READY":
continue
if result.text.startswith("OK|"):
return result.text.split("|")[1]
raise Exception(f"Solve error: {result.text}")
raise TimeoutError("Solve timed out")
Step 3: Submit the Solution
def submit_captcha_solution(session, captcha_page_html, solution, captcha_page_url):
soup = BeautifulSoup(captcha_page_html, "html.parser")
form = soup.find("form")
# Build form data
form_data = {}
for inp in form.find_all("input"):
name = inp.get("name")
if name:
form_data[name] = inp.get("value", "")
# Set the CAPTCHA answer
form_data["field-keywords"] = solution
# Submit
action = form.get("action", captcha_page_url)
if action.startswith("/"):
from urllib.parse import urljoin
action = urljoin(captcha_page_url, action)
resp = session.post(action, data=form_data)
return resp
Full Working Example
import requests
import base64
import time
from bs4 import BeautifulSoup
API_KEY = "YOUR_API_KEY"
def scrape_amazon_product(url):
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9"
})
resp = session.get(url)
# Handle CAPTCHA if present
if "captcha" in resp.text.lower():
soup = BeautifulSoup(resp.text, "html.parser")
img = soup.find("img", src=lambda s: s and "captcha" in s.lower())
if img:
# Download and solve
img_data = session.get(img["src"]).content
img_b64 = base64.b64encode(img_data).decode()
submit = requests.get("https://ocr.captchaai.com/in.php", params={
"key": API_KEY, "method": "base64", "body": img_b64
})
task_id = submit.text.split("|")[1]
for _ in range(30):
time.sleep(5)
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get", "id": task_id
})
if result.text == "CAPCHA_NOT_READY":
continue
if result.text.startswith("OK|"):
solution = result.text.split("|")[1]
break
# Submit solution
form = soup.find("form")
form_data = {inp.get("name"): inp.get("value", "")
for inp in form.find_all("input") if inp.get("name")}
form_data["field-keywords"] = solution
action = form.get("action", url)
resp = session.post(action, data=form_data)
# Parse product data
soup = BeautifulSoup(resp.text, "html.parser")
title = soup.find("span", {"id": "productTitle"})
price = soup.find("span", class_="a-price-whole")
return {
"title": title.text.strip() if title else None,
"price": price.text.strip() if price else None
}
product = scrape_amazon_product("https://www.amazon.com/dp/B0EXAMPLE")
print(product)
Best Practices for Amazon Scraping
- Use residential proxies — Amazon blocks datacenter IPs aggressively
- Rotate User-Agents — Use a pool of realistic browser strings
- Maintain sessions — Keep cookies across requests
- Add delays — 3-10 seconds between requests
- Set Accept-Language — Always include locale headers
- Don't scrape logged-in pages — Product pages are accessible without login
Troubleshooting
| Issue | Fix |
|---|---|
| CAPTCHA on every request | Use residential proxies; slow down request rate |
| CAPTCHA solution rejected | Verify image was downloaded correctly; retry |
| Redirect loops | Check cookie handling; use allow_redirects=True |
| Empty product data | Amazon may serve different layouts; check selectors |
FAQ
Does Amazon use reCAPTCHA?
Amazon primarily uses its own image-based CAPTCHA (distorted text). CaptchaAI solves these using the method=base64 endpoint for image/OCR solving.
How many requests before Amazon shows a CAPTCHA?
It varies. With good proxies and realistic headers, you may scrape hundreds of pages. Without proxies, CAPTCHAs can appear after 10-20 requests.
Is scraping Amazon legal?
Scraping publicly available product data is generally legal, but check Amazon's terms of service and applicable laws in your jurisdiction.
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.