Use Cases

Real Estate Data Scraping with CAPTCHA Handling

Real estate platforms are heavily protected against automated data collection. CaptchaAI helps you maintain reliable access to property listings, pricing data, and market analytics.

CAPTCHA Protections on Real Estate Sites

Platform Type Protection CAPTCHA Type
MLS aggregators Cloudflare Challenge Full challenge + proxy
Zillow-type portals reCAPTCHA v3 Invisible, behavioral
Realtor directories reCAPTCHA v2 Checkbox or invisible
Property tax records Image CAPTCHA Text recognition
Auction sites Cloudflare Turnstile Widget challenge
Commercial listings reCAPTCHA v2 Enterprise Enhanced verification

Property Data Collector

import requests
import time
import re
import json
import csv
import os
from datetime import datetime

API_KEY = os.environ["CAPTCHAAI_API_KEY"]


def solve_captcha(params):
    params["key"] = API_KEY
    resp = requests.get("https://ocr.captchaai.com/in.php", params=params)
    if not resp.text.startswith("OK|"):
        raise Exception(f"Submit: {resp.text}")

    task_id = resp.text.split("|")[1]
    for _ in range(60):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id,
        })
        if result.text == "CAPCHA_NOT_READY":
            continue
        if result.text.startswith("OK|"):
            return result.text.split("|", 1)[1]
        raise Exception(f"Solve: {result.text}")
    raise TimeoutError()


class PropertyCollector:
    def __init__(self):
        self.session = requests.Session()
        self.session.headers["User-Agent"] = (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 Chrome/120.0.0.0"
        )

    def fetch(self, url):
        """Fetch page with automatic CAPTCHA handling."""
        resp = self.session.get(url)

        # reCAPTCHA
        match = re.search(
            r'data-sitekey=["\']([A-Za-z0-9_-]+)["\']', resp.text
        )
        if match:
            # Detect v3
            is_v3 = "recaptcha/api.js?render=" in resp.text
            params = {
                "method": "userrecaptcha",
                "googlekey": match.group(1),
                "pageurl": url,
            }
            if is_v3:
                params["version"] = "v3"
                params["action"] = "search"

            token = solve_captcha(params)
            resp = self.session.post(url, data={
                "g-recaptcha-response": token,
            })

        # Turnstile
        if "cf-turnstile" in resp.text:
            match = re.search(r'data-sitekey=["\']([^"\']+)', resp.text)
            if match:
                token = solve_captcha({
                    "method": "turnstile",
                    "sitekey": match.group(1),
                    "pageurl": url,
                })
                resp = self.session.post(url, data={
                    "cf-turnstile-response": token,
                })

        return resp.text

    def collect_listings(self, urls):
        """Collect property listings from multiple pages."""
        listings = []
        for url in urls:
            try:
                html = self.fetch(url)
                page_listings = self._parse_listings(html)
                listings.extend(page_listings)
                print(f"  {len(page_listings)} listings from {url}")
                time.sleep(3)
            except Exception as e:
                print(f"  Error: {url} - {e}")
        return listings

    def _parse_listings(self, html):
        """Extract property data from HTML."""
        listings = []

        # Price extraction
        prices = re.findall(r'\$\s*([\d,]+)', html)
        # Address extraction
        addresses = re.findall(
            r'class="address"[^>]*>(.*?)</(?:div|span|p)', html
        )
        # Bed/Bath extraction
        beds = re.findall(r'(\d+)\s*(?:bed|br|bedroom)', html, re.I)
        baths = re.findall(r'(\d+)\s*(?:bath|ba|bathroom)', html, re.I)
        # Sqft extraction
        sqft = re.findall(r'([\d,]+)\s*(?:sq\s*ft|sqft)', html, re.I)

        # Combine available data
        count = max(len(prices), len(addresses), 1)
        for i in range(min(count, 50)):  # Cap at 50 per page
            listing = {
                "price": prices[i] if i < len(prices) else None,
                "address": (
                    addresses[i].strip() if i < len(addresses) else None
                ),
                "beds": beds[i] if i < len(beds) else None,
                "baths": baths[i] if i < len(baths) else None,
                "sqft": sqft[i] if i < len(sqft) else None,
                "collected_at": datetime.utcnow().isoformat(),
            }
            if listing["price"] or listing["address"]:
                listings.append(listing)

        return listings

    def export_csv(self, listings, filename):
        if not listings:
            print("No listings to export")
            return

        keys = ["price", "address", "beds", "baths", "sqft", "collected_at"]
        with open(filename, "w", newline="", encoding="utf-8") as f:
            writer = csv.DictWriter(f, fieldnames=keys)
            writer.writeheader()
            writer.writerows(listings)
        print(f"Exported {len(listings)} listings to {filename}")


# Usage
collector = PropertyCollector()

search_urls = [
    "https://example-realty.com/search?city=austin&type=sale&page=1",
    "https://example-realty.com/search?city=austin&type=sale&page=2",
    "https://example-realty.com/search?city=austin&type=sale&page=3",
]

listings = collector.collect_listings(search_urls)
collector.export_csv(listings, "austin_listings.csv")

Data Points to Collect

Field Source Use Case
Listing price Property page Market valuation
Address Property page Geo-analysis
Beds/Baths/Sqft Property details Comparable analysis
Days on market Listing metadata Market velocity
Price history Price change log Trend analysis
Property tax Tax records Investment analysis
HOA fees Listing details Cost analysis

Market Analysis Workflow

Daily Collection
    → Property listings (500-1000 per market)
    → Price changes (delta from previous day)
    → New listings vs delisted

Weekly Analysis
    → Median price trends
    → Inventory levels
    → Days-on-market averages
    → Price-per-sqft by neighborhood

Monthly Report
    → Market heat map
    → Competitive pricing analysis
    → Investment opportunity scoring

FAQ

Public listing data is generally scrapable. Avoid collecting personal information about sellers or agents. Always comply with the site's terms of service.

How do I handle pagination?

Increment the page parameter in your URLs. Most real estate sites use ?page=N or &offset=N patterns.

Which CAPTCHA type is hardest on real estate sites?

Cloudflare Challenge on MLS aggregators is the most complex — it requires proxy parameters. reCAPTCHA v3 on major portals is common but solved reliably by CaptchaAI.

Discussions (0)

No comments yet.

Related Posts

Comparisons ScrapingBee vs Building with CaptchaAI: When to Use Which
Compare Scraping Bee's -in-one scraping API with building your own solution using Captcha AI.

Compare Scraping Bee's all-in-one scraping API with building your own solution using Captcha AI. Cost, flexibi...

Python All CAPTCHA Types Web Scraping
Mar 16, 2026
Reference CAPTCHA Types Comparison Matrix 2025
Complete side-by-side comparison of every major CAPTCHA type in 2025 — re CAPTCHA, Turnstile, Gee Test, BLS, h Captcha, and image CAPTCHAs.

Complete side-by-side comparison of every major CAPTCHA type in 2025 — re CAPTCHA, Turnstile, Gee Test, BLS, h...

All CAPTCHA Types Web Scraping
Mar 31, 2026
Explainers Rate Limiting CAPTCHA Solving Workflows
Sending too many requests too fast triggers blocks, bans, and wasted CAPTCHA solves.

Sending too many requests too fast triggers blocks, bans, and wasted CAPTCHA solves. Smart rate limiting keeps...

Automation Python All CAPTCHA Types
Apr 04, 2026
Use Cases Proxy Rotation for CAPTCHA Scraping
How to combine proxy rotation with Captcha AI to reduce CAPTCHA frequency and maintain scraping reliability.

How to combine proxy rotation with Captcha AI to reduce CAPTCHA frequency and maintain scraping reliability.

All CAPTCHA Types Web Scraping Proxies
Feb 28, 2026
Tutorials Dynamic CAPTCHA Loading: Detecting Lazy-Loaded CAPTCHAs
Detect and solve CAPTCHAs that load dynamically after user interaction — Mutation Observer, scroll triggers, and event-based rendering.

Detect and solve CAPTCHAs that load dynamically after user interaction — Mutation Observer, scroll triggers, a...

Python All CAPTCHA Types Web Scraping
Apr 03, 2026
Reference Complete Guide: CAPTCHA Solving from Basics to Production
End-to-end guide covering CAPTCHA fundamentals, solving approaches, API integration, error handling, scaling, and production deployment with Captcha AI.

End-to-end guide covering CAPTCHA fundamentals, solving approaches, API integration, error handling, scaling,...

Python All CAPTCHA Types Web Scraping
Jan 13, 2026
Explainers IP Reputation and CAPTCHA Solving: Best Practices
Manage IP reputation for CAPTCHA solving workflows.

Manage IP reputation for CAPTCHA solving workflows. Understand IP scoring, proxy rotation, and how IP quality...

Python All CAPTCHA Types Web Scraping
Mar 23, 2026
API Tutorials Building a Custom Scraping Framework with CaptchaAI
Build a modular scraping framework with built-in Captcha AI CAPTCHA solving.

Build a modular scraping framework with built-in Captcha AI CAPTCHA solving. Queue management, middleware pipe...

Python All CAPTCHA Types Web Scraping
Feb 27, 2026
Use Cases Headless Browser CAPTCHA Issues and Solutions
Common CAPTCHA problems in headless browsers and how to solve them using Captcha AI with Selenium, Puppeteer, and Playwright.

Common CAPTCHA problems in headless browsers and how to solve them using Captcha AI with Selenium, Puppeteer,...

All CAPTCHA Types Web Scraping
Mar 27, 2026
Reference CAPTCHA Glossary: Complete Developer Reference
Complete glossary of CAPTCHA terms, concepts, and acronyms for developers.

Complete glossary of CAPTCHA terms, concepts, and acronyms for developers. From API keys to zero-click challen...

All CAPTCHA Types Web Scraping
Mar 12, 2026
Use Cases Retail Site Data Collection with CAPTCHA Handling
Amazon uses image CAPTCHAs to block automated access.

Amazon uses image CAPTCHAs to block automated access. When you hit their anti-bot threshold, you'll see a page...

Web Scraping Image OCR
Apr 07, 2026
Use Cases Event Ticket Monitoring with CAPTCHA Handling
Build an event ticket availability monitor that handles CAPTCHAs using Captcha AI.

Build an event ticket availability monitor that handles CAPTCHAs using Captcha AI. Python workflow for checkin...

Automation Python reCAPTCHA v2
Jan 17, 2026
Use Cases Automated Form Submission with CAPTCHA Handling
Complete guide to automating web form submissions that include CAPTCHA challenges — re CAPTCHA, Turnstile, and image CAPTCHAs with Captcha AI.

Complete guide to automating web form submissions that include CAPTCHA challenges — re CAPTCHA, Turnstile, and...

Python reCAPTCHA v2 Cloudflare Turnstile
Mar 21, 2026