Multi-Language Image CAPTCHA: Character Set Configuration

CaptchaAI's Image/OCR solver supports over 27,500 CAPTCHA types across multiple writing systems. The language parameter tells the solver which character set to expect, directly affecting recognition accuracy. Using the wrong setting means the solver looks for the wrong characters.

Language Parameter Reference

Value	Character sets	Best for
`0`	Not specified (default)	Latin-only CAPTCHAs — English, Spanish, French, German
`1`	Cyrillic only	Russian, Ukrainian, Bulgarian CAPTCHAs
`2`	Non-Latin characters	Chinese, Japanese, Korean, Arabic, Cyrillic, mixed scripts

When to Use Each Setting

CAPTCHA content	Setting	Why
English letters + numbers	`0` or omit	Default Latin recognition
Russian text	`1` or `2`	Cyrillic-specific model
Chinese characters	`2`	CJK character set required
Japanese hiragana/katakana	`2`	Non-Latin recognition needed
Arabic script	`2`	Non-Latin recognition needed
Korean hangul	`2`	Non-Latin recognition needed
Mixed Latin + Cyrillic	`2`	Handles multiple scripts
Numbers only	`0` or omit	Digits are universal

Python: Language-Aware CAPTCHA Solving

import requests
import base64
import time

API_KEY = "YOUR_API_KEY"
SUBMIT_URL = "https://ocr.captchaai.com/in.php"
RESULT_URL = "https://ocr.captchaai.com/res.php"


def solve_image_captcha(image_path: str, language: int = 0) -> str:
    """Solve an image CAPTCHA with the specified language setting.

    Args:
        image_path: Path to the CAPTCHA image file.
        language: 0=Latin, 1=Cyrillic, 2=non-Latin/mixed.
    """
    with open(image_path, "rb") as f:
        image_b64 = base64.b64encode(f.read()).decode()

    params = {
        "key": API_KEY,
        "method": "base64",
        "body": image_b64,
        "json": 1,
    }

    # Only set language if non-default
    if language > 0:
        params["language"] = language

    resp = requests.post(SUBMIT_URL, data=params, timeout=30).json()
    if resp.get("status") != 1:
        raise RuntimeError(f"Submit: {resp.get('request')}")

    task_id = resp["request"]
    for _ in range(24):
        time.sleep(5)
        poll = requests.get(RESULT_URL, params={
            "key": API_KEY, "action": "get", "id": task_id, "json": 1,
        }, timeout=15).json()

        if poll.get("request") == "CAPCHA_NOT_READY":
            continue
        if poll.get("status") == 1:
            return poll["request"]
        raise RuntimeError(f"Solve: {poll.get('request')}")

    raise RuntimeError("Timeout")


def detect_script(text: str) -> str:
    """Detect the primary script of solved text."""
    for ch in text:
        cp = ord(ch)
        if 0x0400 <= cp <= 0x04FF:
            return "cyrillic"
        if 0x4E00 <= cp <= 0x9FFF:
            return "cjk"
        if 0x3040 <= cp <= 0x30FF:
            return "japanese"
        if 0xAC00 <= cp <= 0xD7AF:
            return "korean"
        if 0x0600 <= cp <= 0x06FF:
            return "arabic"
        if 0x0590 <= cp <= 0x05FF:
            return "hebrew"
    return "latin"


# --- Usage examples ---

# Latin CAPTCHA (default)
latin_text = solve_image_captcha("english_captcha.png", language=0)
print(f"Latin: {latin_text} (script: {detect_script(latin_text)})")

# Russian CAPTCHA
cyrillic_text = solve_image_captcha("russian_captcha.png", language=1)
print(f"Cyrillic: {cyrillic_text} (script: {detect_script(cyrillic_text)})")

# Chinese CAPTCHA
chinese_text = solve_image_captcha("chinese_captcha.png", language=2)
print(f"CJK: {chinese_text} (script: {detect_script(chinese_text)})")

# Mixed script CAPTCHA
mixed_text = solve_image_captcha("mixed_captcha.png", language=2)
print(f"Mixed: {mixed_text} (script: {detect_script(mixed_text)})")

JavaScript: Multi-Language Solver

const API_KEY = "YOUR_API_KEY";
const SUBMIT_URL = "https://ocr.captchaai.com/in.php";
const RESULT_URL = "https://ocr.captchaai.com/res.php";
const fs = require("fs");

async function solveImageCaptcha(imagePath, language = 0) {
  const imageB64 = fs.readFileSync(imagePath, "base64");

  const params = {
    key: API_KEY,
    method: "base64",
    body: imageB64,
    json: "1",
  };

  if (language > 0) params.language = String(language);

  const body = new URLSearchParams(params);
  const resp = await (await fetch(SUBMIT_URL, { method: "POST", body })).json();
  if (resp.status !== 1) throw new Error(`Submit: ${resp.request}`);

  const taskId = resp.request;
  for (let i = 0; i < 24; i++) {
    await new Promise((r) => setTimeout(r, 5000));
    const url = `${RESULT_URL}?key=${API_KEY}&action=get&id=${taskId}&json=1`;
    const poll = await (await fetch(url)).json();
    if (poll.request === "CAPCHA_NOT_READY") continue;
    if (poll.status === 1) return poll.request;
    throw new Error(`Solve: ${poll.request}`);
  }
  throw new Error("Timeout");
}

function detectScript(text) {
  for (const ch of text) {
    const cp = ch.codePointAt(0);
    if (cp >= 0x0400 && cp <= 0x04ff) return "cyrillic";
    if (cp >= 0x4e00 && cp <= 0x9fff) return "cjk";
    if (cp >= 0x3040 && cp <= 0x30ff) return "japanese";
    if (cp >= 0xac00 && cp <= 0xd7af) return "korean";
    if (cp >= 0x0600 && cp <= 0x06ff) return "arabic";
  }
  return "latin";
}

// Auto-detect-and-solve helper
async function solveWithAutoLanguage(imagePath, hint = "auto") {
  const languageMap = {
    latin: 0,
    cyrillic: 1,
    russian: 1,
    chinese: 2,
    japanese: 2,
    korean: 2,
    arabic: 2,
    auto: 2, // language=2 handles all scripts
  };

  const language = languageMap[hint] ?? 2;
  return solveImageCaptcha(imagePath, language);
}

// Usage
const text = await solveWithAutoLanguage("captcha.png", "auto");
console.log(`Text: ${text}, Script: ${detectScript(text)}`);

Common Mistakes

Mistake	Effect	Fix
Using `language=0` for Cyrillic	Returns Latin lookalikes (B instead of В)	Use `language=1` or `language=2`
Using `language=1` for Chinese	Solver expects Cyrillic, gets CJK	Use `language=2` for non-Latin scripts
Omitting language for mixed scripts	May misidentify ambiguous characters	Always use `language=2` for mixed content
Assuming default handles everything	Latin-only model misses non-Latin chars	Set language explicitly for non-English sites

Troubleshooting

Issue	Cause	Fix
Correct-looking text but wrong Unicode	Cyrillic/Latin homoglyph confusion	Check codepoints: `hex(ord(char))`
Empty result for CJK CAPTCHA	Language not set to 2	Set `language=2` for Chinese/Japanese/Korean
Mixed numbers and characters wrong	Numbers universal but context matters	Use `language=2` for any non-Latin context
Solve rate drops when switching sites	Different language requirements	Match language param to each site's character set
Result encoding garbled	HTTP response not decoded as UTF-8	Force UTF-8: `response.encoding = 'utf-8'`

FAQ

What happens if I use the wrong language setting?

The solver attempts to match the image against the wrong character models. Latin characters may be returned for Cyrillic text (B instead of В), or the solve may fail entirely for CJK characters. Setting language=2 is the safest fallback for unknown scripts.

Can I use `language=2` for everything?

Yes, but it's slightly less optimised for pure Latin CAPTCHAs. For English-only sites, omitting the parameter (default Latin) gives the best accuracy. For any site where you're unsure about the script, language=2 handles all character sets.

Does the language parameter affect solve speed?

Minimally. The solver selects the appropriate OCR model based on the parameter. More complex character sets (CJK with thousands of characters) may take marginally longer than Latin (26 characters), but the difference is typically under a second.

Multi Character Image Captcha Solving Strategies

Next Steps

Solve CAPTCHAs in any language — get your CaptchaAI API key and set the right language parameter.

Related guides:

Multi-Language Image CAPTCHA: Character Set Configuration

Language Parameter Reference

When to Use Each Setting

Python: Language-Aware CAPTCHA Solving

JavaScript: Multi-Language Solver

Common Mistakes

Troubleshooting

FAQ

What happens if I use the wrong language setting?

Can I use `language=2` for everything?

Does the language parameter affect solve speed?

Next Steps

Discussions (0)

Solve Image CAPTCHA with Python OCR and CaptchaAI

Image CAPTCHA Confidence Scores: Using CaptchaAI Quality Metrics

CAPTCHA Solving Fallback Chains

CaptchaAI API Latency Optimization: Faster Solves

Python Multiprocessing for Parallel CAPTCHA Solving

Building a Python Wrapper Library for CaptchaAI API

Language Parameter Reference

When to Use Each Setting

Python: Language-Aware CAPTCHA Solving

JavaScript: Multi-Language Solver

Common Mistakes

Troubleshooting

FAQ

What happens if I use the wrong language setting?

Can I use language=2 for everything?

Does the language parameter affect solve speed?

Related Articles

Next Steps

Discussions (0)

Join the conversation

Related Posts

Solve Image CAPTCHA with Python OCR and CaptchaAI

Image CAPTCHA Confidence Scores: Using CaptchaAI Quality Metrics

CAPTCHA Solving Fallback Chains

CaptchaAI API Latency Optimization: Faster Solves

Python Multiprocessing for Parallel CAPTCHA Solving

Building a Python Wrapper Library for CaptchaAI API

Can I use `language=2` for everything?