Public records portals — court systems, county assessors, secretary of state filings, vital records databases — rely heavily on Image and OCR CAPTCHAs. These portals often use older CAPTCHA implementations with distorted text, math problems, or custom image challenges. Here's how to handle them across different record types.
Common CAPTCHA Types by Portal
| Portal category | Typical CAPTCHA | Challenge examples |
|---|---|---|
| Court case search | Custom text CAPTCHA | Distorted 5–6 char alphanumeric |
| County property records | Math CAPTCHA | "What is 4 + 7?" |
| Business entity search | Image text CAPTCHA | Warped letters with line noise |
| Vital records | reCAPTCHA v2 | Image grid selection |
| Building permits | Simple text CAPTCHA | 4-digit numeric code |
| UCC filings | Custom OCR CAPTCHA | Mixed case letters with background noise |
Public Records Search with CAPTCHA Solving
import requests
import base64
import time
from urllib.parse import urljoin
class PublicRecordsSearcher:
def __init__(self, api_key):
self.api_key = api_key
self.session = requests.Session()
self.session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})
def search_court_records(self, portal_url, case_number):
"""Search court records, solving image CAPTCHAs as needed."""
# Load the search page
page = self.session.get(f"{portal_url}/search")
# Extract CAPTCHA image
captcha_img_url = self._extract_captcha_url(page.text, portal_url)
if not captcha_img_url:
# No CAPTCHA on this page
return self._submit_search(portal_url, case_number)
# Download and solve CAPTCHA
img_response = self.session.get(captcha_img_url)
captcha_text = self._solve_image_captcha(img_response.content)
# Submit search with solved CAPTCHA
return self._submit_search(portal_url, case_number, captcha_text)
def _extract_captcha_url(self, html, base_url):
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
# Look for common CAPTCHA image patterns
captcha_img = (
soup.find("img", {"id": "captchaImage"}) or
soup.find("img", {"class": "captcha"}) or
soup.find("img", attrs={"src": lambda s: s and "captcha" in s.lower()})
)
if captcha_img and captcha_img.get("src"):
return urljoin(base_url, captcha_img["src"])
return None
def _solve_image_captcha(self, image_bytes):
img_base64 = base64.b64encode(image_bytes).decode("utf-8")
resp = requests.post("https://ocr.captchaai.com/in.php", data={
"key": self.api_key,
"method": "base64",
"body": img_base64,
"json": 1
})
task_id = resp.json()["request"]
for _ in range(30):
time.sleep(3)
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": self.api_key,
"action": "get",
"id": task_id,
"json": 1
})
data = result.json()
if data["status"] == 1:
return data["request"]
raise TimeoutError("CAPTCHA solve timed out")
def _submit_search(self, portal_url, case_number, captcha_text=None):
form_data = {"caseNumber": case_number}
if captcha_text:
form_data["captcha"] = captcha_text
response = self.session.post(
f"{portal_url}/search/results",
data=form_data
)
return response.text
# Usage
searcher = PublicRecordsSearcher("YOUR_API_KEY")
results = searcher.search_court_records(
"https://courts.example.gov",
"2024-CV-12345"
)
Math CAPTCHA Handling
Some government portals use simple math CAPTCHAs. CaptchaAI handles these as text recognition:
def solve_math_captcha(self, image_bytes):
"""Solve math CAPTCHAs like '4 + 7 = ?'"""
img_base64 = base64.b64encode(image_bytes).decode("utf-8")
resp = requests.post("https://ocr.captchaai.com/in.php", data={
"key": self.api_key,
"method": "base64",
"body": img_base64,
"textinstructions": "solve the math equation and return only the number",
"json": 1
})
task_id = resp.json()["request"]
# Poll for result
for _ in range(30):
time.sleep(3)
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": self.api_key,
"action": "get",
"id": task_id,
"json": 1
})
data = result.json()
if data["status"] == 1:
return data["request"]
raise TimeoutError("Math CAPTCHA solve timed out")
Multi-Portal Search (JavaScript)
class RecordsAggregator {
constructor(apiKey) {
this.apiKey = apiKey;
}
async searchAcrossPortals(query, portals) {
const results = [];
for (const portal of portals) {
try {
const data = await this.searchPortal(portal, query);
results.push({ portal: portal.name, records: data });
} catch (error) {
results.push({ portal: portal.name, error: error.message });
}
}
return results;
}
async searchPortal(portal, query) {
const pageResponse = await fetch(portal.searchUrl);
const html = await pageResponse.text();
// Check for image CAPTCHA
const captchaMatch = html.match(/captcha[^"]*\.(?:png|jpg|gif)/i);
let captchaAnswer = null;
if (captchaMatch) {
const imgUrl = new URL(captchaMatch[0], portal.searchUrl).href;
const imgData = await fetch(imgUrl);
const buffer = await imgData.arrayBuffer();
const base64 = Buffer.from(buffer).toString('base64');
captchaAnswer = await this.solveImageCaptcha(base64);
}
// Submit search
const formData = new URLSearchParams({ q: query });
if (captchaAnswer) formData.append('captcha', captchaAnswer);
const response = await fetch(portal.searchUrl, {
method: 'POST',
body: formData
});
return response.text();
}
async solveImageCaptcha(base64Image) {
const submitResp = await fetch('https://ocr.captchaai.com/in.php', {
method: 'POST',
body: new URLSearchParams({
key: this.apiKey,
method: 'base64',
body: base64Image,
json: '1'
})
});
const { request: taskId } = await submitResp.json();
for (let i = 0; i < 30; i++) {
await new Promise(r => setTimeout(r, 3000));
const result = await fetch(
`https://ocr.captchaai.com/res.php?key=${this.apiKey}&action=get&id=${taskId}&json=1`
);
const data = await result.json();
if (data.status === 1) return data.request;
}
throw new Error('CAPTCHA solve timed out');
}
}
// Usage
const aggregator = new RecordsAggregator('YOUR_API_KEY');
const results = await aggregator.searchAcrossPortals('Smith LLC', [
{ name: 'State Business Registry', searchUrl: 'https://sos.example.gov/search' },
{ name: 'County Court Records', searchUrl: 'https://courts.example.gov/search' }
]);
CAPTCHA Parameters for Government Portals
| Parameter | Value | When to use |
|---|---|---|
method |
base64 |
Image downloaded as bytes |
method |
post |
Submit image file directly |
language |
0 |
English/Latin text CAPTCHAs |
numeric |
1 |
Digits-only CAPTCHAs |
min_len / max_len |
Varies | When character count is predictable |
textinstructions |
Custom prompt | Math CAPTCHAs or specific formats |
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| CAPTCHA image returns 403 | Session cookie missing | Load search page first, then fetch image |
| Wrong CAPTCHA answer | Low image quality | Preprocess image (increase contrast, remove noise) |
| CAPTCHA refreshes on submit | Form token expired | Extract hidden form fields with CAPTCHA |
| Search returns empty after CAPTCHA | POST redirect lost cookies | Use allow_redirects=True and persist session |
FAQ
Why do government portals still use image CAPTCHAs?
Many government systems run legacy software that predates reCAPTCHA and Cloudflare Turnstile. Custom image CAPTCHAs were the standard when these systems were built, and updates are slow in government IT.
How accurate is CaptchaAI for distorted text CAPTCHAs?
CaptchaAI supports over 27,500 image CAPTCHA types with high accuracy. For heavily distorted text, use the textinstructions parameter to guide recognition — for example, "letters and numbers, no spaces."
Should I preprocess CAPTCHA images before submitting?
Preprocessing (grayscale, contrast boost, noise removal) can improve accuracy for very low-quality images. See the image preprocessing guide for techniques.
Next Steps
Automate public records searches — get your CaptchaAI API key and handle government portal CAPTCHAs.
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.