You logged in successfully. You scraped 50 pages. Then a CAPTCHA appears — blocking your session without warning. Mid-session CAPTCHAs are triggered by behavior, not just the initial visit. This guide covers why they appear, how to detect them, and how to solve them without losing your session.
Why CAPTCHAs appear mid-session
| Trigger | Description |
|---|---|
| Request rate | Too many requests in a short time |
| Navigation pattern | Non-human browsing patterns (no pauses, no scrolling) |
| Session age | Cookie or token expiry after a set duration |
| IP reputation change | Proxy IP gets flagged during the session |
| Action-based triggers | Specific actions (checkout, form submit) trigger verification |
| JavaScript fingerprint | Missing or inconsistent browser fingerprint |
Detecting mid-session CAPTCHAs
Your scraper must check every response for CAPTCHA indicators:
import requests
session = requests.Session()
def has_captcha(response):
"""Check if a response contains a CAPTCHA challenge."""
html = response.text.lower()
# reCAPTCHA
if "g-recaptcha" in html or "www.google.com/recaptcha" in html:
return "recaptcha"
# Cloudflare Turnstile
if "cf-turnstile" in html or "challenges.cloudflare.com/turnstile" in html:
return "turnstile"
# Cloudflare Challenge (full-page)
if response.status_code in [403, 503] and "just a moment" in html:
return "cloudflare_challenge"
# Generic CAPTCHA indicators
if "captcha" in html and ("verify" in html or "robot" in html):
return "unknown"
return None
def safe_get(url):
"""GET with automatic CAPTCHA detection."""
resp = session.get(url)
captcha_type = has_captcha(resp)
if captcha_type:
print(f"CAPTCHA detected ({captcha_type}) on {url}")
resp = handle_captcha(resp, url, captcha_type)
return resp
Solving mid-session CAPTCHAs
When a CAPTCHA is detected, solve it and continue without losing the session:
import time
API_KEY = "YOUR_API_KEY"
def solve_recaptcha(sitekey, pageurl):
submit = requests.post("https://ocr.captchaai.com/in.php", data={
"key": API_KEY, "method": "userrecaptcha",
"googlekey": sitekey, "pageurl": pageurl, "json": 1
}).json()
if submit.get("status") != 1:
raise RuntimeError(submit.get("request"))
task_id = submit["request"]
time.sleep(15)
for _ in range(24):
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get", "id": task_id, "json": 1
}).json()
if result.get("status") == 1:
return result["request"]
time.sleep(5)
raise TimeoutError("Solve timed out")
def handle_captcha(response, url, captcha_type):
"""Solve the detected CAPTCHA and retry the request."""
html = response.text
if captcha_type == "recaptcha":
if 'data-sitekey="' in html:
start = html.index('data-sitekey="') + 14
end = html.index('"', start)
sitekey = html[start:end]
token = solve_recaptcha(sitekey, url)
# Submit the token using the SAME session (preserves cookies)
return session.post(url, data={
"g-recaptcha-response": token
})
if captcha_type == "turnstile":
if 'data-sitekey="' in html:
start = html.index('data-sitekey="') + 14
end = html.index('"', start)
sitekey = html[start:end]
submit = requests.post("https://ocr.captchaai.com/in.php", data={
"key": API_KEY, "method": "turnstile",
"sitekey": sitekey, "pageurl": url, "json": 1
}).json()
task_id = submit["request"]
time.sleep(10)
for _ in range(24):
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get", "id": task_id, "json": 1
}).json()
if result.get("status") == 1:
return session.post(url, data={
"cf-turnstile-response": result["request"]
})
time.sleep(5)
# Fallback: return original response
return response
Preventing mid-session CAPTCHAs
| Strategy | How |
|---|---|
| Slow down requests | Add 2–5 second delays between pages |
| Randomize timing | Use random.uniform(2, 5) for natural pauses |
| Rotate User-Agent | Change User-Agent periodically |
| Preserve cookies | Use session persistence across all requests |
| Use residential proxies | Lower CAPTCHA trigger rate |
| Mimic human patterns | Vary request order, skip some pages |
import random
def controlled_scrape(urls):
for url in urls:
resp = safe_get(url)
# Process response...
delay = random.uniform(2, 5)
time.sleep(delay)
Maintaining session after solving
The key to mid-session CAPTCHA handling is session persistence. Never create a new session after solving.
# WRONG — new session loses auth cookies
new_session = requests.Session()
new_session.post(url, data={"g-recaptcha-response": token})
# CORRECT — same session preserves auth cookies
session.post(url, data={"g-recaptcha-response": token})
# Continue scraping with the same session
next_page = session.get("https://example.com/page/2")
FAQ
Why does a CAPTCHA appear after I am already logged in?
Sites use CAPTCHAs to gate suspicious actions, not just initial access. Fast navigation, bulk downloads, or repeated form submissions trigger mid-session verification.
Does solving the mid-session CAPTCHA log me out?
Not if you use the same session object. The authentication cookies remain intact. Only create a new session if you need to re-authenticate.
How do I handle CAPTCHAs that appear on AJAX/API calls?
Inspect the API response for CAPTCHA indicators (HTML fragments, specific JSON error codes, or 403 status). Solve the CAPTCHA and replay the failed API call.
Should I re-solve the login CAPTCHA or the new one?
Solve the new CAPTCHA. The login CAPTCHA was already passed. The mid-session CAPTCHA is a separate challenge.
Handle mid-session CAPTCHAs with CaptchaAI
Keep your sessions running at captchaai.com.
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.