Ad Verification Workflows with CAPTCHA Handling

Ad verification requires visiting thousands of web pages to check ad placement, brand safety, and compliance. Many publisher sites use CAPTCHAs that block automated checks. CaptchaAI keeps your verification pipeline running.

What Ad Verification Checks

Check	Description	Why CAPTCHAs Block It
Ad placement	Is the ad shown above the fold?	Automated page visits trigger bot detection
Brand safety	No ads next to harmful content	Bulk URL checking resembles scraping
Viewability	Was the ad actually visible?	Headless browsers flagged by Cloudflare
Geographic targeting	Right ad in right region	Proxy traffic triggers CAPTCHAs
Competitor monitoring	What ads do competitors show?	High-volume ad lookups

Implementation

import requests
import time
import re
import json
import os
from datetime import datetime

API_KEY = os.environ["CAPTCHAAI_API_KEY"]


def solve_captcha(method, params):
    params["key"] = API_KEY
    params["method"] = method

    resp = requests.get("https://ocr.captchaai.com/in.php", params=params)
    if not resp.text.startswith("OK|"):
        raise Exception(resp.text)

    task_id = resp.text.split("|")[1]
    for _ in range(60):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id,
        })
        if result.text == "CAPCHA_NOT_READY":
            continue
        if result.text.startswith("OK|"):
            return result.text.split("|", 1)[1]
        raise Exception(result.text)
    raise TimeoutError()


def verify_ad_placement(url, session):
    """Verify ad placement on a publisher page."""
    resp = session.get(url)

    # Solve CAPTCHA if present
    match = re.search(r'data-sitekey=["\']([A-Za-z0-9_-]+)["\']', resp.text)
    if match:
        token = solve_captcha("userrecaptcha", {
            "googlekey": match.group(1),
            "pageurl": url,
        })
        resp = session.post(url, data={"g-recaptcha-response": token})

    html = resp.text

    # Check for ad elements
    result = {
        "url": url,
        "timestamp": datetime.utcnow().isoformat(),
        "ads_found": [],
        "brand_safety": True,
        "captcha_solved": match is not None,
    }

    # Detect ad tags
    ad_patterns = [
        (r'googletag\.pubads', "Google Ad Manager"),
        (r'doubleclick\.net', "DFP/DoubleClick"),
        (r'ad\.doubleclick', "DoubleClick"),
        (r'amazon-adsystem', "Amazon Ads"),
        (r'criteo\.com/.*\.js', "Criteo"),
    ]

    for pattern, name in ad_patterns:
        if re.search(pattern, html):
            result["ads_found"].append(name)

    # Brand safety check — flag problematic content
    safety_keywords = [
        "violence", "hate speech", "explicit",
        "gambling", "illegal",
    ]
    page_text = re.sub(r'<[^>]+>', '', html).lower()
    for keyword in safety_keywords:
        if keyword in page_text:
            result["brand_safety"] = False
            break

    return result


def run_verification(urls, output_file="verification_report.json"):
    """Run ad verification across multiple publisher URLs."""
    session = requests.Session()
    session.headers["User-Agent"] = (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 Chrome/120.0.0.0"
    )

    results = []
    for i, url in enumerate(urls):
        try:
            result = verify_ad_placement(url, session)
            results.append(result)
            ads = ", ".join(result["ads_found"]) or "None"
            safe = "SAFE" if result["brand_safety"] else "UNSAFE"
            print(f"  [{i+1}/{len(urls)}] {url}: {ads} [{safe}]")
        except Exception as e:
            results.append({
                "url": url,
                "error": str(e),
                "timestamp": datetime.utcnow().isoformat(),
            })
            print(f"  [{i+1}/{len(urls)}] {url}: ERROR - {e}")

        time.sleep(2)

    with open(output_file, "w") as f:
        json.dump(results, f, indent=2)

    # Summary
    total = len(results)
    safe = sum(1 for r in results if r.get("brand_safety"))
    captchas = sum(1 for r in results if r.get("captcha_solved"))
    errors = sum(1 for r in results if "error" in r)

    print(f"\n  Total: {total} | Safe: {safe} | CAPTCHAs solved: {captchas} | Errors: {errors}")

    return results


# Publisher URLs to verify
publisher_urls = [
    "https://publisher1.com/article/tech-news",
    "https://publisher2.com/sports/latest",
    "https://publisher3.com/finance/markets",
]

run_verification(publisher_urls)

Scaling with Cloudflare-Protected Publishers

Many premium publishers use Cloudflare. Handle both Turnstile and full challenges:

def handle_cloudflare(url, session):
    """Handle Cloudflare-protected publisher pages."""
    resp = session.get(url)

    if "cf-turnstile" in resp.text:
        match = re.search(r'data-sitekey=["\']([^"\']+)', resp.text)
        if match:
            token = solve_captcha("turnstile", {
                "sitekey": match.group(1),
                "pageurl": url,
            })
            return session.post(url, data={
                "cf-turnstile-response": token,
            })

    if resp.status_code == 403 and "cf-browser-verification" in resp.text:
        data = solve_captcha("cloudflare_challenge", {
            "pageurl": url,
            "proxy": "user:pass@proxy:port",
            "proxytype": "HTTP",
        })
        # Parse cf_clearance and use same proxy
        return data

    return resp

FAQ

How many pages can I verify per hour?

With CaptchaAI, you can verify 200-500 pages per hour depending on CAPTCHA frequency and solve times.

Does this work for video ad verification?

This approach works for display and native ads. Video ad verification typically requires browser rendering with Selenium or Playwright.

How do I handle different regions?

Use proxies from target geographies. CaptchaAI supports proxy parameters so the solve context matches your geographic targeting.

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

Ad Verification Workflows with CAPTCHA Handling

What Ad Verification Checks

Implementation

Scaling with Cloudflare-Protected Publishers

FAQ

How many pages can I verify per hour?

Does this work for video ad verification?

How do I handle different regions?

Discussions (0)

Best CAPTCHA Solving Services Compared (2025)

CaptchaAI vs 2Captcha: Speed, Price, and API Comparison

CaptchaAI API Key Setup and Authentication

Discord Webhook Alerts for CAPTCHA Pipeline Status

Why CAPTCHA Tokens Work in the API but Fail in the Browser

Python ThreadPoolExecutor for CAPTCHA Solving Parallelism

What Ad Verification Checks

Implementation

Scaling with Cloudflare-Protected Publishers

FAQ

How many pages can I verify per hour?

Does this work for video ad verification?

How do I handle different regions?

Related Guides

Discussions (0)

Join the conversation

Related Posts

Best CAPTCHA Solving Services Compared (2025)

CaptchaAI vs 2Captcha: Speed, Price, and API Comparison

CaptchaAI API Key Setup and Authentication

Discord Webhook Alerts for CAPTCHA Pipeline Status

Why CAPTCHA Tokens Work in the API but Fail in the Browser

Python ThreadPoolExecutor for CAPTCHA Solving Parallelism