Russian, Ukrainian, Bulgarian, and Serbian websites use Cyrillic text CAPTCHAs that look deceptively similar to Latin — characters like А, В, С, Е, Н, О appear identical to their Latin counterparts but are completely different Unicode codepoints. This creates unique recognition and submission challenges that standard Latin OCR misses.
Cyrillic vs. Latin Confusable Characters
| Looks like | Latin | Cyrillic | Unicode |
|---|---|---|---|
| A | A (U+0041) | А (U+0410) | Different codepoints |
| B | B (U+0042) | В (U+0412) | Cyrillic is "Ve" |
| C | C (U+0043) | С (U+0421) | Cyrillic is "Es" |
| E | E (U+0045) | Е (U+0415) | Different encoding |
| H | H (U+0048) | Н (U+041D) | Cyrillic is "En" |
| O | O (U+004F) | О (U+041E) | Different codepoints |
| P | P (U+0050) | Р (U+0420) | Cyrillic is "Er" |
Submitting the wrong codepoint causes form validation to reject correct-looking text.
Python: Cyrillic Image CAPTCHA
import requests
import base64
import time
API_KEY = "YOUR_API_KEY"
SUBMIT_URL = "https://ocr.captchaai.com/in.php"
RESULT_URL = "https://ocr.captchaai.com/res.php"
def solve_cyrillic_captcha(image_path: str) -> str:
"""Solve a Cyrillic text image CAPTCHA."""
with open(image_path, "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
resp = requests.post(SUBMIT_URL, data={
"key": API_KEY,
"method": "base64",
"body": image_b64,
"language": 2, # Non-Latin character support
"json": 1,
}, timeout=30).json()
if resp.get("status") != 1:
raise RuntimeError(f"Submit: {resp.get('request')}")
task_id = resp["request"]
for _ in range(24):
time.sleep(5)
poll = requests.get(RESULT_URL, params={
"key": API_KEY, "action": "get", "id": task_id, "json": 1,
}, timeout=15).json()
if poll.get("request") == "CAPCHA_NOT_READY":
continue
if poll.get("status") == 1:
return poll["request"]
raise RuntimeError(f"Solve: {poll.get('request')}")
raise RuntimeError("Timeout")
def solve_cyrillic_from_session(session: requests.Session,
captcha_url: str) -> str:
"""Solve a Cyrillic CAPTCHA within a session context."""
resp = session.get(captcha_url, timeout=15)
image_b64 = base64.b64encode(resp.content).decode()
submit = requests.post(SUBMIT_URL, data={
"key": API_KEY,
"method": "base64",
"body": image_b64,
"language": 2,
"json": 1,
}, timeout=30).json()
if submit.get("status") != 1:
raise RuntimeError(f"Submit: {submit.get('request')}")
task_id = submit["request"]
for _ in range(24):
time.sleep(5)
poll = requests.get(RESULT_URL, params={
"key": API_KEY, "action": "get", "id": task_id, "json": 1,
}, timeout=15).json()
if poll.get("request") == "CAPCHA_NOT_READY":
continue
if poll.get("status") == 1:
return poll["request"]
raise RuntimeError(f"Solve: {poll.get('request')}")
raise RuntimeError("Timeout")
def verify_cyrillic(text: str) -> bool:
"""Verify that solved text contains Cyrillic characters."""
return any('\u0400' <= ch <= '\u04FF' for ch in text)
# --- Russian website form flow ---
def solve_russian_form(form_url: str, captcha_url: str,
form_data: dict) -> requests.Response:
"""Complete a Russian website form with CAPTCHA."""
session = requests.Session()
session.headers.update({
"Accept-Language": "ru-RU,ru;q=0.9",
})
# Establish session
session.get(form_url, timeout=15)
# Solve CAPTCHA
captcha_text = solve_cyrillic_from_session(session, captcha_url)
print(f"Cyrillic CAPTCHA: {captcha_text}")
if verify_cyrillic(captcha_text):
print("Confirmed: contains Cyrillic characters")
form_data["captcha"] = captcha_text
return session.post(form_url, data=form_data, timeout=30)
# --- Usage ---
text = solve_cyrillic_captcha("russian_captcha.png")
print(f"Solved: {text}")
print(f"Is Cyrillic: {verify_cyrillic(text)}")
print(f"Unicode codepoints: {[hex(ord(c)) for c in text]}")
JavaScript: Cyrillic CAPTCHA Handling
const API_KEY = "YOUR_API_KEY";
const SUBMIT_URL = "https://ocr.captchaai.com/in.php";
const RESULT_URL = "https://ocr.captchaai.com/res.php";
const fs = require("fs");
async function solveCyrillicCaptcha(imagePath) {
const imageB64 = fs.readFileSync(imagePath, "base64");
const body = new URLSearchParams({
key: API_KEY,
method: "base64",
body: imageB64,
language: "2",
json: "1",
});
const resp = await (await fetch(SUBMIT_URL, { method: "POST", body })).json();
if (resp.status !== 1) throw new Error(`Submit: ${resp.request}`);
const taskId = resp.request;
for (let i = 0; i < 24; i++) {
await new Promise((r) => setTimeout(r, 5000));
const url = `${RESULT_URL}?key=${API_KEY}&action=get&id=${taskId}&json=1`;
const poll = await (await fetch(url)).json();
if (poll.request === "CAPCHA_NOT_READY") continue;
if (poll.status === 1) return poll.request;
throw new Error(`Solve: ${poll.request}`);
}
throw new Error("Timeout");
}
function isCyrillic(text) {
return /[\u0400-\u04FF]/.test(text);
}
function showCodepoints(text) {
return [...text].map((ch) => `${ch}=U+${ch.codePointAt(0).toString(16).padStart(4, "0")}`);
}
// Usage
const text = await solveCyrillicCaptcha("russian_captcha.png");
console.log(`Solved: ${text}`);
console.log(`Is Cyrillic: ${isCyrillic(text)}`);
console.log(`Codepoints: ${showCodepoints(text).join(", ")}`);
Common Cyrillic CAPTCHA Patterns
| Pattern | Description | Example |
|---|---|---|
| Pure Cyrillic word | Random Russian word | ШКАФ, ПИРОГ |
| Mixed Latin + Cyrillic | Both scripts in one image | ABСDе (A,B,D Latin; С,е Cyrillic) |
| Cyrillic digits spelled out | Number words | ПЯТЬ (five), ТРИ (three) |
| Math in Russian | Arithmetic in words | два плюс три = ? |
| Distorted Cyrillic | Warped Russian text | Standard OCR challenge with Cyrillic |
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| Form rejects correct-looking text | Latin/Cyrillic homoglyph mismatch | Check Unicode codepoints — А (U+0410) ≠ A (U+0041) |
| Characters garbled on display | Wrong encoding | Use UTF-8 throughout; set response.encoding = 'utf-8' |
| Mixed script text partially wrong | OCR confused Latin and Cyrillic | CaptchaAI with language=2 distinguishes correctly |
| Ukrainian-specific characters missing | ґ, є, і, ї not recognized | These are supported with language=2 |
| CAPTCHA case sensitivity | Uppercase/lowercase matters | Submit exactly as returned by CaptchaAI |
FAQ
How does CaptchaAI distinguish Cyrillic В from Latin B?
CaptchaAI's OCR models are trained on context and glyph features. When language=2 is set, the solver uses Cyrillic-aware models that return proper Unicode codepoints. The returned text will use Cyrillic characters (U+0400–U+04FF) for Russian text.
Does it handle Ukrainian-specific characters?
Yes. Ukrainian uses characters not present in Russian — ґ (U+0491), є (U+0454), і (U+0456), ї (U+0457). CaptchaAI recognizes these with language=2. The solver handles all Cyrillic scripts including Russian, Ukrainian, Bulgarian, and Serbian.
What if the CAPTCHA mixes Cyrillic and Latin?
Some CAPTCHAs intentionally mix scripts to create ambiguity. CaptchaAI returns the text with correct Unicode codepoints for each character. Verify using the verify_cyrillic() function or by inspecting codepoints.
Next Steps
Solve Cyrillic CAPTCHAs on Russian and Slavic websites — get your CaptchaAI API key.
Related guides:
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.