Use Cases

CAPTCHA Handling for Stock Market Data Collection

Financial portals protect stock quotes, earnings data, and analyst reports with Cloudflare Turnstile and reCAPTCHA. CAPTCHAs trigger during rapid symbol lookups, historical data downloads, and screener queries. Here's how to maintain reliable data collection across financial sites.

CAPTCHA Patterns on Financial Portals

Data type Portal examples CAPTCHA type Trigger
Real-time quotes Finance portals Cloudflare Turnstile Rapid symbol lookups
Historical prices Data providers reCAPTCHA v2 Bulk CSV downloads
Financial statements SEC filing sites Image CAPTCHA Repeated EDGAR queries
Screener results Stock screeners Cloudflare Challenge Complex filter queries
Analyst ratings Research portals reCAPTCHA v3 Multiple page views

Stock Data Collector

import requests
import time
import re
from datetime import datetime, timedelta

class StockDataCollector:
    def __init__(self, api_key):
        self.api_key = api_key
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        })

    def get_quote(self, portal_url, symbol):
        """Get current stock quote, solving CAPTCHAs if needed."""
        url = f"{portal_url}/quote/{symbol}"
        response = self.session.get(url)

        if self._is_captcha_page(response):
            response = self._solve_and_retry(response, url)

        return self._parse_quote(response.text, symbol)

    def get_historical(self, portal_url, symbol, days=365):
        """Download historical price data."""
        url = f"{portal_url}/history/{symbol}"
        params = {
            "period": f"{days}d",
            "interval": "1d"
        }
        response = self.session.get(url, params=params)

        if self._is_captcha_page(response):
            response = self._solve_and_retry(response, url)

        return self._parse_historical(response.text)

    def scan_symbols(self, portal_url, symbols, delay=2):
        """Collect quotes for multiple symbols."""
        results = {}

        for symbol in symbols:
            try:
                results[symbol] = self.get_quote(portal_url, symbol)
                time.sleep(delay)
            except Exception as e:
                results[symbol] = {"error": str(e)}

        return results

    def _is_captcha_page(self, response):
        return (
            response.status_code == 403 or
            "cf-turnstile" in response.text or
            "challenges.cloudflare.com" in response.text
        )

    def _solve_and_retry(self, response, url):
        match = re.search(r'data-sitekey="(0x[^"]+)"', response.text)
        if not match:
            # Fall back to reCAPTCHA detection
            match = re.search(r'data-sitekey="([^"]+)"', response.text)
            if match:
                return self._solve_recaptcha_and_retry(match.group(1), url)
            raise ValueError("No CAPTCHA sitekey found")

        resp = requests.post("https://ocr.captchaai.com/in.php", data={
            "key": self.api_key,
            "method": "turnstile",
            "sitekey": match.group(1),
            "pageurl": url,
            "json": 1
        })
        task_id = resp.json()["request"]

        for _ in range(60):
            time.sleep(3)
            result = requests.get("https://ocr.captchaai.com/res.php", params={
                "key": self.api_key,
                "action": "get",
                "id": task_id,
                "json": 1
            })
            data = result.json()
            if data["status"] == 1:
                return self.session.post(url, data={
                    "cf-turnstile-response": data["request"]
                })

        raise TimeoutError("CAPTCHA solve timed out")

    def _solve_recaptcha_and_retry(self, site_key, url):
        resp = requests.post("https://ocr.captchaai.com/in.php", data={
            "key": self.api_key,
            "method": "userrecaptcha",
            "googlekey": site_key,
            "pageurl": url,
            "json": 1
        })
        task_id = resp.json()["request"]

        for _ in range(60):
            time.sleep(3)
            result = requests.get("https://ocr.captchaai.com/res.php", params={
                "key": self.api_key,
                "action": "get",
                "id": task_id,
                "json": 1
            })
            data = result.json()
            if data["status"] == 1:
                return self.session.post(url, data={
                    "g-recaptcha-response": data["request"]
                })

        raise TimeoutError("reCAPTCHA solve timed out")

    def _parse_quote(self, html, symbol):
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html, "html.parser")
        return {
            "symbol": symbol,
            "price": soup.select_one("[data-field='regularMarketPrice'], .price")?.text?.strip(),
            "change": soup.select_one("[data-field='regularMarketChange'], .change")?.text?.strip(),
            "volume": soup.select_one("[data-field='regularMarketVolume'], .volume")?.text?.strip(),
            "market_cap": soup.select_one("[data-field='marketCap'], .market-cap")?.text?.strip(),
            "timestamp": datetime.now().isoformat()
        }

    def _parse_historical(self, html):
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html, "html.parser")
        rows = []

        for row in soup.select("table tr")[1:]:  # Skip header
            cells = [td.text.strip() for td in row.select("td")]
            if len(cells) >= 6:
                rows.append({
                    "date": cells[0],
                    "open": cells[1],
                    "high": cells[2],
                    "low": cells[3],
                    "close": cells[4],
                    "volume": cells[5]
                })

        return rows


# Usage
collector = StockDataCollector("YOUR_API_KEY")

# Single quote
quote = collector.get_quote("https://finance.example.com", "AAPL")
print(f"AAPL: ${quote['price']} ({quote['change']})")

# Scan multiple symbols
portfolio = collector.scan_symbols(
    "https://finance.example.com",
    ["AAPL", "GOOGL", "MSFT", "AMZN", "TSLA"]
)

Market Screener with CAPTCHA Handling (JavaScript)

class MarketScreener {
  constructor(apiKey) {
    this.apiKey = apiKey;
  }

  async screenStocks(portalUrl, filters) {
    const params = new URLSearchParams(filters);
    const response = await fetch(`${portalUrl}/screener?${params}`);
    const html = await response.text();

    if (html.includes('cf-turnstile') || response.status === 403) {
      return this.solveAndScreen(portalUrl, filters, html);
    }

    return this.parseScreenerResults(html);
  }

  async solveAndScreen(portalUrl, filters, html) {
    const match = html.match(/data-sitekey="(0x[^"]+)"/);
    if (!match) throw new Error('Turnstile sitekey not found');

    const submitResp = await fetch('https://ocr.captchaai.com/in.php', {
      method: 'POST',
      body: new URLSearchParams({
        key: this.apiKey,
        method: 'turnstile',
        sitekey: match[1],
        pageurl: portalUrl,
        json: '1'
      })
    });
    const { request: taskId } = await submitResp.json();

    for (let i = 0; i < 60; i++) {
      await new Promise(r => setTimeout(r, 3000));
      const result = await fetch(
        `https://ocr.captchaai.com/res.php?key=${this.apiKey}&action=get&id=${taskId}&json=1`
      );
      const data = await result.json();
      if (data.status === 1) {
        const response = await fetch(`${portalUrl}/screener`, {
          method: 'POST',
          body: new URLSearchParams({
            ...filters,
            'cf-turnstile-response': data.request
          })
        });
        return this.parseScreenerResults(await response.text());
      }
    }
    throw new Error('Turnstile solve timed out');
  }

  parseScreenerResults(html) {
    const rows = [];
    const tableMatch = html.match(/<table[^>]*>[\s\S]*?<\/table>/i);
    if (!tableMatch) return rows;

    const rowMatches = tableMatch[0].matchAll(/<tr[^>]*>([\s\S]*?)<\/tr>/gi);
    for (const row of rowMatches) {
      const cells = [...row[1].matchAll(/<td[^>]*>([\s\S]*?)<\/td>/gi)]
        .map(m => m[1].replace(/<[^>]+>/g, '').trim());
      if (cells.length >= 4) {
        rows.push({
          symbol: cells[0],
          price: cells[1],
          change: cells[2],
          volume: cells[3]
        });
      }
    }
    return rows;
  }
}

// Usage
const screener = new MarketScreener('YOUR_API_KEY');
const results = await screener.screenStocks('https://finance.example.com', {
  sector: 'technology',
  marketCap: 'large',
  peRatio: '<25'
});

Collection Frequency by Data Type

Data type Recommended interval CAPTCHA frequency
Real-time quotes 1–5 minutes High — use API if available
End-of-day prices Once daily after close Low
Financial statements Quarterly Minimal
Screener results Daily Moderate
Analyst ratings Weekly Low

Troubleshooting

Issue Cause Fix
Turnstile on every request New session each time Persist cookies across requests
Historical data incomplete Pagination behind CAPTCHA Solve per page, follow paginated links
Quote data stale Cached response served Add cache-busting query parameter
429 rate limit Too many requests Increase delay, rotate proxies

FAQ

Are financial APIs better than scraping with CAPTCHAs?

Free APIs (like some providers' basic tiers) cover common data but have rate limits. Web scraping with CaptchaAI gives access to data not available through APIs — screener filters, analyst commentary, and niche financial data.

How do I handle real-time quote CAPTCHAs without delay?

Pre-authenticate your session during market hours. Cloudflare's cf_clearance cookie lasts 15–30 minutes, so solve once and make multiple requests within that window.

Can I collect data from multiple financial portals simultaneously?

Yes. Use separate sessions per portal with independent CaptchaAI task submissions. CaptchaAI handles concurrent solving across different sites.

Next Steps

Collect market data reliably — get your CaptchaAI API key and handle financial portal CAPTCHAs automatically.

Discussions (0)

No comments yet.

Related Posts

Reference CAPTCHA Token Injection Methods Reference
Complete reference for injecting solved CAPTCHA tokens into web pages.

Complete reference for injecting solved CAPTCHA tokens into web pages. Covers re CAPTCHA, Turnstile, and Cloud...

Automation Python reCAPTCHA v2
Apr 08, 2026
Troubleshooting Turnstile Token Invalid After Solving: Diagnosis and Fixes
Fix Cloudflare Turnstile tokens that come back invalid after solving with Captcha AI.

Fix Cloudflare Turnstile tokens that come back invalid after solving with Captcha AI. Covers token expiry, sit...

Python Cloudflare Turnstile Web Scraping
Apr 08, 2026
Tutorials Pytest Fixtures for CaptchaAI API Testing
Build reusable pytest fixtures to test CAPTCHA-solving workflows with Captcha AI.

Build reusable pytest fixtures to test CAPTCHA-solving workflows with Captcha AI. Covers mocking, live integra...

Automation Python reCAPTCHA v2
Apr 08, 2026
Reference Browser Session Persistence for CAPTCHA Workflows
Manage browser sessions, cookies, and storage across CAPTCHA-solving runs to reduce repeat challenges and maintain authenticated state.

Manage browser sessions, cookies, and storage across CAPTCHA-solving runs to reduce repeat challenges and main...

Automation Python reCAPTCHA v2
Feb 24, 2026
Integrations Browser Profile Isolation + CaptchaAI Integration
Browser profile isolation tools create distinct browser environments with unique fingerprints per session.

Browser profile isolation tools create distinct browser environments with unique fingerprints per session. Com...

Automation Python reCAPTCHA v2
Feb 21, 2026
Comparisons WebDriver vs Chrome DevTools Protocol for CAPTCHA Automation
Compare Web Driver and Chrome Dev Tools Protocol (CDP) for CAPTCHA automation — detection, performance, capabilities, and when to use each with Captcha AI.

Compare Web Driver and Chrome Dev Tools Protocol (CDP) for CAPTCHA automation — detection, performance, capabi...

Automation Python reCAPTCHA v2
Mar 27, 2026
Use Cases Event Ticket Monitoring with CAPTCHA Handling
Build an event ticket availability monitor that handles CAPTCHAs using Captcha AI.

Build an event ticket availability monitor that handles CAPTCHAs using Captcha AI. Python workflow for checkin...

Automation Python reCAPTCHA v2
Jan 17, 2026
Use Cases Automated Form Submission with CAPTCHA Handling
Complete guide to automating web form submissions that include CAPTCHA challenges — re CAPTCHA, Turnstile, and image CAPTCHAs with Captcha AI.

Complete guide to automating web form submissions that include CAPTCHA challenges — re CAPTCHA, Turnstile, and...

Python reCAPTCHA v2 Cloudflare Turnstile
Mar 21, 2026
Use Cases CAPTCHA Solving in Ticket Purchase Automation
How to handle CAPTCHAs on ticketing platforms Ticketmaster, AXS, and event sites using Captcha AI for automated purchasing workflows.

How to handle CAPTCHAs on ticketing platforms Ticketmaster, AXS, and event sites using Captcha AI for automate...

Automation Python reCAPTCHA v2
Feb 25, 2026
Use Cases Retail Site Data Collection with CAPTCHA Handling
Amazon uses image CAPTCHAs to block automated access.

Amazon uses image CAPTCHAs to block automated access. When you hit their anti-bot threshold, you'll see a page...

Web Scraping Image OCR
Apr 07, 2026
Use Cases Government Portal Automation with CAPTCHA Solving
Automate government portal interactions (visa applications, permit filings, records requests) with Captcha AI handling CAPTCHA challenges.

Automate government portal interactions (visa applications, permit filings, records requests) with Captcha AI...

Automation Python reCAPTCHA v2
Jan 30, 2026
Use Cases Job Board Scraping with CAPTCHA Handling Using CaptchaAI
Scrape job listings from Indeed, Linked In, Glassdoor, and other job boards that use CAPTCHAs with Captcha AI integration.

Scrape job listings from Indeed, Linked In, Glassdoor, and other job boards that use CAPTCHAs with Captcha AI...

Python reCAPTCHA v2 Cloudflare Turnstile
Feb 28, 2026