Integrations

Octoparse + CaptchaAI: Visual Scraping with CAPTCHA Handling

Octoparse is a visual web scraping tool that lets non-coders extract data. When CAPTCHAs block extraction, CaptchaAI provides the solution.


When Octoparse Encounters CAPTCHAs

Scenario What Happens
reCAPTCHA on target page Extraction stops, manual solve needed
Cloudflare challenge Page loads but no data extracted
Rate-limiting CAPTCHA After N pages, CAPTCHA appears
Login-protected data Login form has CAPTCHA

Since Octoparse is a visual tool, the integration uses a Python helper to solve CAPTCHAs and export session cookies for Octoparse:

import requests
import time
import json


class OctoparseCaptchaHelper:
    """Solve CAPTCHAs and export cookies for Octoparse."""

    def __init__(self, api_key):
        self.api_key = api_key
        self.session = requests.Session()

    def solve_and_get_cookies(self, login_url, sitekey, credentials):
        """
        Solve login CAPTCHA and return session cookies.

        Steps:

        1. Visit login page to get initial cookies
        2. Solve CAPTCHA via CaptchaAI
        3. Submit login form with token
        4. Export authenticated cookies
        """
        # Step 1: Get initial cookies
        self.session.get(login_url, timeout=15)

        # Step 2: Solve CAPTCHA
        token = self._solve_recaptcha(sitekey, login_url)

        # Step 3: Submit login
        login_data = {
            **credentials,
            "g-recaptcha-response": token,
        }
        resp = self.session.post(login_url, data=login_data, timeout=30)

        if resp.status_code != 200:
            raise RuntimeError(f"Login failed: {resp.status_code}")

        # Step 4: Export cookies
        cookies = []
        for cookie in self.session.cookies:
            cookies.append({
                "name": cookie.name,
                "value": cookie.value,
                "domain": cookie.domain,
                "path": cookie.path,
            })

        return cookies

    def export_cookies_for_octoparse(self, cookies, output_file="cookies.json"):
        """Save cookies in format importable by Octoparse."""
        with open(output_file, "w") as f:
            json.dump(cookies, f, indent=2)
        print(f"Cookies saved to {output_file}")
        print(f"Import these in Octoparse: Task → Advanced Settings → Cookies")

    def _solve_recaptcha(self, sitekey, pageurl):
        """Solve reCAPTCHA via CaptchaAI."""
        resp = requests.post("https://ocr.captchaai.com/in.php", data={
            "key": self.api_key,
            "method": "userrecaptcha",
            "googlekey": sitekey,
            "pageurl": pageurl,
            "json": 1,
        }, timeout=30)
        result = resp.json()

        if result.get("status") != 1:
            raise RuntimeError(f"Submit error: {result.get('request')}")

        task_id = result["request"]
        time.sleep(15)

        for _ in range(24):
            resp = requests.get("https://ocr.captchaai.com/res.php", params={
                "key": self.api_key, "action": "get",
                "id": task_id, "json": 1,
            }, timeout=15)
            data = resp.json()

            if data.get("status") == 1:
                return data["request"]
            if data["request"] != "CAPCHA_NOT_READY":
                raise RuntimeError(data["request"])
            time.sleep(5)

        raise TimeoutError("Solve timeout")


# Usage
helper = OctoparseCaptchaHelper("YOUR_API_KEY")

cookies = helper.solve_and_get_cookies(
    login_url="https://example.com/login",
    sitekey="6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-",
    credentials={"username": "user", "password": "pass"},
)

helper.export_cookies_for_octoparse(cookies)

Approach: API-Based Extraction with CAPTCHA Solving

For more control, use CaptchaAI directly in a Python script alongside Octoparse:

def extract_with_captcha(api_key, urls, sitekey):
    """Extract data from CAPTCHA-protected pages."""
    results = []

    for url in urls:
        print(f"Processing: {url}")

        # Solve CAPTCHA for this page
        helper = OctoparseCaptchaHelper(api_key)
        token = helper._solve_recaptcha(sitekey, url)

        # Access page with token
        resp = requests.post(url, data={
            "g-recaptcha-response": token,
        }, timeout=30)

        # Parse response
        if resp.status_code == 200:
            results.append({
                "url": url,
                "content_length": len(resp.text),
                "status": "success",
            })
        else:
            results.append({
                "url": url,
                "status": f"failed ({resp.status_code})",
            })

        time.sleep(3)  # Rate limit

    return results

Octoparse Configuration Tips

Setting Recommendation
Page load wait Set to 10+ seconds for CAPTCHA pages
Retry on error Enable with 3 retries
Cookie import Use exported cookies from helper
Cloud extraction Use Octoparse cloud with pre-solved cookies
Local extraction Use local mode for initial CAPTCHA bypass

FAQ

Can Octoparse solve CAPTCHAs automatically?

Octoparse has limited built-in CAPTCHA handling. For reliable solving, use CaptchaAI to pre-solve and export session cookies, or switch to a code-based approach for CAPTCHA-heavy sites.

When should I use Octoparse vs. a coded solution?

Use Octoparse for simple, low-CAPTCHA sites. For sites with frequent CAPTCHAs, a Python script with CaptchaAI gives you more control and reliability.

Yes. Run the Python helper on a schedule (e.g., via cron or Task Scheduler) to refresh cookies before each Octoparse extraction run.



Handle CAPTCHAs in visual scraping — try CaptchaAI.

Discussions (0)

No comments yet.

Related Posts

Tutorials Extracting reCAPTCHA Parameters from Page Source
Extract re CAPTCHA parameters from any web page — sitekey, action, data-s, enterprise flag, and version — using regex, DOM queries, and network interception.

Extract all re CAPTCHA parameters from any web page — sitekey, action, data-s, enterprise flag, and version —...

Python reCAPTCHA v2 Web Scraping
Apr 07, 2026
Use Cases Job Board Scraping with CAPTCHA Handling Using CaptchaAI
Scrape job listings from Indeed, Linked In, Glassdoor, and other job boards that use CAPTCHAs with Captcha AI integration.

Scrape job listings from Indeed, Linked In, Glassdoor, and other job boards that use CAPTCHAs with Captcha AI...

Python reCAPTCHA v2 Cloudflare Turnstile
Feb 28, 2026
Explainers How Proxy Quality Affects CAPTCHA Solve Success Rate
Understand how proxy quality, IP reputation, and configuration affect CAPTCHA frequency and solve success rates with Captcha AI.

Understand how proxy quality, IP reputation, and configuration affect CAPTCHA frequency and solve success rate...

Python reCAPTCHA v2 Cloudflare Turnstile
Feb 06, 2026
Tutorials Handling Multiple CAPTCHAs on a Single Page
how to detect and solve multiple CAPTCHAs on a single web page using Captcha AI.

Learn how to detect and solve multiple CAPTCHAs on a single web page using Captcha AI. Covers multi-iframe ext...

Python reCAPTCHA v2 Cloudflare Turnstile
Apr 09, 2026
Integrations Selenium Wire + CaptchaAI: Request Interception for CAPTCHA Solving
Complete guide to using Selenium Wire for request interception, proxy routing, and automated CAPTCHA solving with Captcha AI in Python.

Complete guide to using Selenium Wire for request interception, proxy routing, and automated CAPTCHA solving w...

Python reCAPTCHA v2 Cloudflare Turnstile
Mar 13, 2026
Use Cases Shipping and Logistics Rate Scraping with CAPTCHA Solving
Scrape shipping rates, tracking data, and logistics information from carrier websites protected by CAPTCHAs using Captcha AI.

Scrape shipping rates, tracking data, and logistics information from carrier websites protected by CAPTCHAs us...

Python reCAPTCHA v2 Cloudflare Turnstile
Jan 25, 2026
Use Cases Legal Research Web Scraping with CAPTCHA Handling
Scrape legal databases, court records, and case law from portals protected by CAPTCHAs using Captcha AI for automated legal research.

Scrape legal databases, court records, and case law from portals protected by CAPTCHAs using Captcha AI for au...

Python reCAPTCHA v2 Web Scraping
Jan 17, 2026
Use Cases Multi-Step Workflow Automation with CaptchaAI
Manage workflows across multiple accounts on CAPTCHA-protected platforms — , action, and data collection at scale.

Manage workflows across multiple accounts on CAPTCHA-protected platforms — , action, and data collection at sc...

Automation Python reCAPTCHA v2
Apr 06, 2026
Integrations Browser Profile Isolation + CaptchaAI Integration
Browser profile isolation tools create distinct browser environments with unique fingerprints per session.

Browser profile isolation tools create distinct browser environments with unique fingerprints per session. Com...

Automation Python reCAPTCHA v2
Feb 21, 2026
Integrations Retool + CaptchaAI: Internal Tool CAPTCHA Form Handling
Build Retool internal tools that solve re CAPTCHA v 2 CAPTCHAs by integrating Captcha AI API through REST API queries and Java Script transformers.

Build Retool internal tools that solve re CAPTCHA v 2 CAPTCHAs by integrating Captcha AI API through REST API...

reCAPTCHA v2 Testing No-Code
Mar 19, 2026
Integrations Axios + CaptchaAI: Solve CAPTCHAs Without a Browser
Use Axios and Captcha AI to solve re CAPTCHA, Turnstile, and image CAPTCHAs in Node.js without launching a browser.

Use Axios and Captcha AI to solve re CAPTCHA, Turnstile, and image CAPTCHAs in Node.js without launching a bro...

Automation All CAPTCHA Types
Apr 08, 2026