CAPTCHA Handling for Stock Market Data Collection

Financial portals protect stock quotes, earnings data, and analyst reports with Cloudflare Turnstile and reCAPTCHA. CAPTCHAs trigger during rapid symbol lookups, historical data downloads, and screener queries. Here's how to maintain reliable data collection across financial sites.

CAPTCHA Patterns on Financial Portals

Data type	Portal examples	CAPTCHA type	Trigger
Real-time quotes	Finance portals	Cloudflare Turnstile	Rapid symbol lookups
Historical prices	Data providers	reCAPTCHA v2	Bulk CSV downloads
Financial statements	SEC filing sites	Image CAPTCHA	Repeated EDGAR queries
Screener results	Stock screeners	Cloudflare Challenge	Complex filter queries
Analyst ratings	Research portals	reCAPTCHA v3	Multiple page views

Stock Data Collector

import requests
import time
import re
from datetime import datetime, timedelta

class StockDataCollector:
    def __init__(self, api_key):
        self.api_key = api_key
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        })

    def get_quote(self, portal_url, symbol):
        """Get current stock quote, solving CAPTCHAs if needed."""
        url = f"{portal_url}/quote/{symbol}"
        response = self.session.get(url)

        if self._is_captcha_page(response):
            response = self._solve_and_retry(response, url)

        return self._parse_quote(response.text, symbol)

    def get_historical(self, portal_url, symbol, days=365):
        """Download historical price data."""
        url = f"{portal_url}/history/{symbol}"
        params = {
            "period": f"{days}d",
            "interval": "1d"
        }
        response = self.session.get(url, params=params)

        if self._is_captcha_page(response):
            response = self._solve_and_retry(response, url)

        return self._parse_historical(response.text)

    def scan_symbols(self, portal_url, symbols, delay=2):
        """Collect quotes for multiple symbols."""
        results = {}

        for symbol in symbols:
            try:
                results[symbol] = self.get_quote(portal_url, symbol)
                time.sleep(delay)
            except Exception as e:
                results[symbol] = {"error": str(e)}

        return results

    def _is_captcha_page(self, response):
        return (
            response.status_code == 403 or
            "cf-turnstile" in response.text or
            "challenges.cloudflare.com" in response.text
        )

    def _solve_and_retry(self, response, url):
        match = re.search(r'data-sitekey="(0x[^"]+)"', response.text)
        if not match:
            # Fall back to reCAPTCHA detection
            match = re.search(r'data-sitekey="([^"]+)"', response.text)
            if match:
                return self._solve_recaptcha_and_retry(match.group(1), url)
            raise ValueError("No CAPTCHA sitekey found")

        resp = requests.post("https://ocr.captchaai.com/in.php", data={
            "key": self.api_key,
            "method": "turnstile",
            "sitekey": match.group(1),
            "pageurl": url,
            "json": 1
        })
        task_id = resp.json()["request"]

        for _ in range(60):
            time.sleep(3)
            result = requests.get("https://ocr.captchaai.com/res.php", params={
                "key": self.api_key,
                "action": "get",
                "id": task_id,
                "json": 1
            })
            data = result.json()
            if data["status"] == 1:
                return self.session.post(url, data={
                    "cf-turnstile-response": data["request"]
                })

        raise TimeoutError("CAPTCHA solve timed out")

    def _solve_recaptcha_and_retry(self, site_key, url):
        resp = requests.post("https://ocr.captchaai.com/in.php", data={
            "key": self.api_key,
            "method": "userrecaptcha",
            "googlekey": site_key,
            "pageurl": url,
            "json": 1
        })
        task_id = resp.json()["request"]

        for _ in range(60):
            time.sleep(3)
            result = requests.get("https://ocr.captchaai.com/res.php", params={
                "key": self.api_key,
                "action": "get",
                "id": task_id,
                "json": 1
            })
            data = result.json()
            if data["status"] == 1:
                return self.session.post(url, data={
                    "g-recaptcha-response": data["request"]
                })

        raise TimeoutError("reCAPTCHA solve timed out")

    def _parse_quote(self, html, symbol):
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html, "html.parser")
        return {
            "symbol": symbol,
            "price": soup.select_one("[data-field='regularMarketPrice'], .price")?.text?.strip(),
            "change": soup.select_one("[data-field='regularMarketChange'], .change")?.text?.strip(),
            "volume": soup.select_one("[data-field='regularMarketVolume'], .volume")?.text?.strip(),
            "market_cap": soup.select_one("[data-field='marketCap'], .market-cap")?.text?.strip(),
            "timestamp": datetime.now().isoformat()
        }

    def _parse_historical(self, html):
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html, "html.parser")
        rows = []

        for row in soup.select("table tr")[1:]:  # Skip header
            cells = [td.text.strip() for td in row.select("td")]
            if len(cells) >= 6:
                rows.append({
                    "date": cells[0],
                    "open": cells[1],
                    "high": cells[2],
                    "low": cells[3],
                    "close": cells[4],
                    "volume": cells[5]
                })

        return rows


# Usage
collector = StockDataCollector("YOUR_API_KEY")

# Single quote
quote = collector.get_quote("https://finance.example.com", "AAPL")
print(f"AAPL: ${quote['price']} ({quote['change']})")

# Scan multiple symbols
portfolio = collector.scan_symbols(
    "https://finance.example.com",
    ["AAPL", "GOOGL", "MSFT", "AMZN", "TSLA"]
)

Market Screener with CAPTCHA Handling (JavaScript)

class MarketScreener {
  constructor(apiKey) {
    this.apiKey = apiKey;
  }

  async screenStocks(portalUrl, filters) {
    const params = new URLSearchParams(filters);
    const response = await fetch(`${portalUrl}/screener?${params}`);
    const html = await response.text();

    if (html.includes('cf-turnstile') || response.status === 403) {
      return this.solveAndScreen(portalUrl, filters, html);
    }

    return this.parseScreenerResults(html);
  }

  async solveAndScreen(portalUrl, filters, html) {
    const match = html.match(/data-sitekey="(0x[^"]+)"/);
    if (!match) throw new Error('Turnstile sitekey not found');

    const submitResp = await fetch('https://ocr.captchaai.com/in.php', {
      method: 'POST',
      body: new URLSearchParams({
        key: this.apiKey,
        method: 'turnstile',
        sitekey: match[1],
        pageurl: portalUrl,
        json: '1'
      })
    });
    const { request: taskId } = await submitResp.json();

    for (let i = 0; i < 60; i++) {
      await new Promise(r => setTimeout(r, 3000));
      const result = await fetch(
        `https://ocr.captchaai.com/res.php?key=${this.apiKey}&action=get&id=${taskId}&json=1`
      );
      const data = await result.json();
      if (data.status === 1) {
        const response = await fetch(`${portalUrl}/screener`, {
          method: 'POST',
          body: new URLSearchParams({
            ...filters,
            'cf-turnstile-response': data.request
          })
        });
        return this.parseScreenerResults(await response.text());
      }
    }
    throw new Error('Turnstile solve timed out');
  }

  parseScreenerResults(html) {
    const rows = [];
    const tableMatch = html.match(/<table[^>]*>[\s\S]*?<\/table>/i);
    if (!tableMatch) return rows;

    const rowMatches = tableMatch[0].matchAll(/<tr[^>]*>([\s\S]*?)<\/tr>/gi);
    for (const row of rowMatches) {
      const cells = [...row[1].matchAll(/<td[^>]*>([\s\S]*?)<\/td>/gi)]
        .map(m => m[1].replace(/<[^>]+>/g, '').trim());
      if (cells.length >= 4) {
        rows.push({
          symbol: cells[0],
          price: cells[1],
          change: cells[2],
          volume: cells[3]
        });
      }
    }
    return rows;
  }
}

// Usage
const screener = new MarketScreener('YOUR_API_KEY');
const results = await screener.screenStocks('https://finance.example.com', {
  sector: 'technology',
  marketCap: 'large',
  peRatio: '<25'
});

Collection Frequency by Data Type

Data type	Recommended interval	CAPTCHA frequency
Real-time quotes	1–5 minutes	High — use API if available
End-of-day prices	Once daily after close	Low
Financial statements	Quarterly	Minimal
Screener results	Daily	Moderate
Analyst ratings	Weekly	Low

Troubleshooting

Issue	Cause	Fix
Turnstile on every request	New session each time	Persist cookies across requests
Historical data incomplete	Pagination behind CAPTCHA	Solve per page, follow paginated links
Quote data stale	Cached response served	Add cache-busting query parameter
429 rate limit	Too many requests	Increase delay, rotate proxies

FAQ

Are financial APIs better than scraping with CAPTCHAs?

Free APIs (like some providers' basic tiers) cover common data but have rate limits. Web scraping with CaptchaAI gives access to data not available through APIs — screener filters, analyst commentary, and niche financial data.

How do I handle real-time quote CAPTCHAs without delay?

Pre-authenticate your session during market hours. Cloudflare's cf_clearance cookie lasts 15–30 minutes, so solve once and make multiple requests within that window.

Can I collect data from multiple financial portals simultaneously?

Yes. Use separate sessions per portal with independent CaptchaAI task submissions. CaptchaAI handles concurrent solving across different sites.

Next Steps

Collect market data reliably — get your CaptchaAI API key and handle financial portal CAPTCHAs automatically.

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

CAPTCHA Handling for Stock Market Data Collection

CAPTCHA Patterns on Financial Portals

Stock Data Collector

Market Screener with CAPTCHA Handling (JavaScript)

Collection Frequency by Data Type

Troubleshooting

FAQ

Are financial APIs better than scraping with CAPTCHAs?

How do I handle real-time quote CAPTCHAs without delay?

Can I collect data from multiple financial portals simultaneously?

Next Steps

Discussions (0)

CAPTCHA Token Injection Methods Reference

Cloudflare Challenge vs Turnstile: How to Detect Which One You Have

Solving Cloudflare Turnstile with Python Requests and CaptchaAI

Turnstile Token Invalid After Solving: Diagnosis and Fixes

Multi-Step Workflow Automation with CaptchaAI

Handling Multiple CAPTCHAs on a Single Page

CAPTCHA Patterns on Financial Portals

Stock Data Collector

Market Screener with CAPTCHA Handling (JavaScript)

Collection Frequency by Data Type

Troubleshooting

FAQ

Are financial APIs better than scraping with CAPTCHAs?

How do I handle real-time quote CAPTCHAs without delay?

Can I collect data from multiple financial portals simultaneously?

Related Articles

Next Steps

Discussions (0)

Join the conversation

Related Posts

CAPTCHA Token Injection Methods Reference

Cloudflare Challenge vs Turnstile: How to Detect Which One You Have

Solving Cloudflare Turnstile with Python Requests and CaptchaAI

Turnstile Token Invalid After Solving: Diagnosis and Fixes

Multi-Step Workflow Automation with CaptchaAI

Handling Multiple CAPTCHAs on a Single Page