Financial portals protect stock quotes, earnings data, and analyst reports with Cloudflare Turnstile and reCAPTCHA. CAPTCHAs trigger during rapid symbol lookups, historical data downloads, and screener queries. Here's how to maintain reliable data collection across financial sites.
CAPTCHA Patterns on Financial Portals
| Data type | Portal examples | CAPTCHA type | Trigger |
|---|---|---|---|
| Real-time quotes | Finance portals | Cloudflare Turnstile | Rapid symbol lookups |
| Historical prices | Data providers | reCAPTCHA v2 | Bulk CSV downloads |
| Financial statements | SEC filing sites | Image CAPTCHA | Repeated EDGAR queries |
| Screener results | Stock screeners | Cloudflare Challenge | Complex filter queries |
| Analyst ratings | Research portals | reCAPTCHA v3 | Multiple page views |
Stock Data Collector
import requests
import time
import re
from datetime import datetime, timedelta
class StockDataCollector:
def __init__(self, api_key):
self.api_key = api_key
self.session = requests.Session()
self.session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})
def get_quote(self, portal_url, symbol):
"""Get current stock quote, solving CAPTCHAs if needed."""
url = f"{portal_url}/quote/{symbol}"
response = self.session.get(url)
if self._is_captcha_page(response):
response = self._solve_and_retry(response, url)
return self._parse_quote(response.text, symbol)
def get_historical(self, portal_url, symbol, days=365):
"""Download historical price data."""
url = f"{portal_url}/history/{symbol}"
params = {
"period": f"{days}d",
"interval": "1d"
}
response = self.session.get(url, params=params)
if self._is_captcha_page(response):
response = self._solve_and_retry(response, url)
return self._parse_historical(response.text)
def scan_symbols(self, portal_url, symbols, delay=2):
"""Collect quotes for multiple symbols."""
results = {}
for symbol in symbols:
try:
results[symbol] = self.get_quote(portal_url, symbol)
time.sleep(delay)
except Exception as e:
results[symbol] = {"error": str(e)}
return results
def _is_captcha_page(self, response):
return (
response.status_code == 403 or
"cf-turnstile" in response.text or
"challenges.cloudflare.com" in response.text
)
def _solve_and_retry(self, response, url):
match = re.search(r'data-sitekey="(0x[^"]+)"', response.text)
if not match:
# Fall back to reCAPTCHA detection
match = re.search(r'data-sitekey="([^"]+)"', response.text)
if match:
return self._solve_recaptcha_and_retry(match.group(1), url)
raise ValueError("No CAPTCHA sitekey found")
resp = requests.post("https://ocr.captchaai.com/in.php", data={
"key": self.api_key,
"method": "turnstile",
"sitekey": match.group(1),
"pageurl": url,
"json": 1
})
task_id = resp.json()["request"]
for _ in range(60):
time.sleep(3)
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": self.api_key,
"action": "get",
"id": task_id,
"json": 1
})
data = result.json()
if data["status"] == 1:
return self.session.post(url, data={
"cf-turnstile-response": data["request"]
})
raise TimeoutError("CAPTCHA solve timed out")
def _solve_recaptcha_and_retry(self, site_key, url):
resp = requests.post("https://ocr.captchaai.com/in.php", data={
"key": self.api_key,
"method": "userrecaptcha",
"googlekey": site_key,
"pageurl": url,
"json": 1
})
task_id = resp.json()["request"]
for _ in range(60):
time.sleep(3)
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": self.api_key,
"action": "get",
"id": task_id,
"json": 1
})
data = result.json()
if data["status"] == 1:
return self.session.post(url, data={
"g-recaptcha-response": data["request"]
})
raise TimeoutError("reCAPTCHA solve timed out")
def _parse_quote(self, html, symbol):
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
return {
"symbol": symbol,
"price": soup.select_one("[data-field='regularMarketPrice'], .price")?.text?.strip(),
"change": soup.select_one("[data-field='regularMarketChange'], .change")?.text?.strip(),
"volume": soup.select_one("[data-field='regularMarketVolume'], .volume")?.text?.strip(),
"market_cap": soup.select_one("[data-field='marketCap'], .market-cap")?.text?.strip(),
"timestamp": datetime.now().isoformat()
}
def _parse_historical(self, html):
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
rows = []
for row in soup.select("table tr")[1:]: # Skip header
cells = [td.text.strip() for td in row.select("td")]
if len(cells) >= 6:
rows.append({
"date": cells[0],
"open": cells[1],
"high": cells[2],
"low": cells[3],
"close": cells[4],
"volume": cells[5]
})
return rows
# Usage
collector = StockDataCollector("YOUR_API_KEY")
# Single quote
quote = collector.get_quote("https://finance.example.com", "AAPL")
print(f"AAPL: ${quote['price']} ({quote['change']})")
# Scan multiple symbols
portfolio = collector.scan_symbols(
"https://finance.example.com",
["AAPL", "GOOGL", "MSFT", "AMZN", "TSLA"]
)
Market Screener with CAPTCHA Handling (JavaScript)
class MarketScreener {
constructor(apiKey) {
this.apiKey = apiKey;
}
async screenStocks(portalUrl, filters) {
const params = new URLSearchParams(filters);
const response = await fetch(`${portalUrl}/screener?${params}`);
const html = await response.text();
if (html.includes('cf-turnstile') || response.status === 403) {
return this.solveAndScreen(portalUrl, filters, html);
}
return this.parseScreenerResults(html);
}
async solveAndScreen(portalUrl, filters, html) {
const match = html.match(/data-sitekey="(0x[^"]+)"/);
if (!match) throw new Error('Turnstile sitekey not found');
const submitResp = await fetch('https://ocr.captchaai.com/in.php', {
method: 'POST',
body: new URLSearchParams({
key: this.apiKey,
method: 'turnstile',
sitekey: match[1],
pageurl: portalUrl,
json: '1'
})
});
const { request: taskId } = await submitResp.json();
for (let i = 0; i < 60; i++) {
await new Promise(r => setTimeout(r, 3000));
const result = await fetch(
`https://ocr.captchaai.com/res.php?key=${this.apiKey}&action=get&id=${taskId}&json=1`
);
const data = await result.json();
if (data.status === 1) {
const response = await fetch(`${portalUrl}/screener`, {
method: 'POST',
body: new URLSearchParams({
...filters,
'cf-turnstile-response': data.request
})
});
return this.parseScreenerResults(await response.text());
}
}
throw new Error('Turnstile solve timed out');
}
parseScreenerResults(html) {
const rows = [];
const tableMatch = html.match(/<table[^>]*>[\s\S]*?<\/table>/i);
if (!tableMatch) return rows;
const rowMatches = tableMatch[0].matchAll(/<tr[^>]*>([\s\S]*?)<\/tr>/gi);
for (const row of rowMatches) {
const cells = [...row[1].matchAll(/<td[^>]*>([\s\S]*?)<\/td>/gi)]
.map(m => m[1].replace(/<[^>]+>/g, '').trim());
if (cells.length >= 4) {
rows.push({
symbol: cells[0],
price: cells[1],
change: cells[2],
volume: cells[3]
});
}
}
return rows;
}
}
// Usage
const screener = new MarketScreener('YOUR_API_KEY');
const results = await screener.screenStocks('https://finance.example.com', {
sector: 'technology',
marketCap: 'large',
peRatio: '<25'
});
Collection Frequency by Data Type
| Data type | Recommended interval | CAPTCHA frequency |
|---|---|---|
| Real-time quotes | 1–5 minutes | High — use API if available |
| End-of-day prices | Once daily after close | Low |
| Financial statements | Quarterly | Minimal |
| Screener results | Daily | Moderate |
| Analyst ratings | Weekly | Low |
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| Turnstile on every request | New session each time | Persist cookies across requests |
| Historical data incomplete | Pagination behind CAPTCHA | Solve per page, follow paginated links |
| Quote data stale | Cached response served | Add cache-busting query parameter |
| 429 rate limit | Too many requests | Increase delay, rotate proxies |
FAQ
Are financial APIs better than scraping with CAPTCHAs?
Free APIs (like some providers' basic tiers) cover common data but have rate limits. Web scraping with CaptchaAI gives access to data not available through APIs — screener filters, analyst commentary, and niche financial data.
How do I handle real-time quote CAPTCHAs without delay?
Pre-authenticate your session during market hours. Cloudflare's cf_clearance cookie lasts 15–30 minutes, so solve once and make multiple requests within that window.
Can I collect data from multiple financial portals simultaneously?
Yes. Use separate sessions per portal with independent CaptchaAI task submissions. CaptchaAI handles concurrent solving across different sites.
Related Articles
- Market Research Data Collection
- Geetest Vs Cloudflare Turnstile Comparison
- Cloudflare Turnstile 403 After Token Fix
Next Steps
Collect market data reliably — get your CaptchaAI API key and handle financial portal CAPTCHAs automatically.
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.