How Grid Image CAPTCHA Challenges Work

Grid image CAPTCHAs display an image divided into cells and ask users to select specific cells based on visual content. This format is used by reCAPTCHA, custom CAPTCHA systems, and various website protection services.

The core mechanism

A large image is split into a grid (3×3, 4×4, or custom)
Text instructions describe what to find ("Select all squares with traffic lights")
The user clicks cells containing the target object
The server verifies the selection against known correct answers
If correct, access is granted; if wrong, a new challenge appears

Grid formats

Standard 3×3 grid (9 cells)

The most common format. One image is divided into 9 equal sections:

┌─────┬─────┬─────┐
│  1  │  2  │  3  │
├─────┼─────┼─────┤
│  4  │  5  │  6  │
├─────┼─────┼─────┤
│  7  │  8  │  9  │
└─────┴─────┴─────┘

4×4 grid (16 cells)

Used for higher security. Smaller cells make object identification harder:

┌────┬────┬────┬────┐
│ 1  │ 2  │ 3  │ 4  │
├────┼────┼────┼────┤
│ 5  │ 6  │ 7  │ 8  │
├────┼────┼────┼────┤
│ 9  │ 10 │ 11 │ 12 │
├────┼────┼────┼────┤
│ 13 │ 14 │ 15 │ 16 │
└────┴────┴────┴────┘

Custom grids

Some systems use irregular layouts — different-sized cells, non-square grids, or overlapping images.

Types of grid challenges

Type	Description	Example
Single object	Select all cells containing one object type	"Select all buses"
Multi-round	New tiles replace selected ones	reCAPTCHA dynamic grids
Ordered selection	Click items in a specific sequence	"Click the cars from left to right"
Negative selection	Identify cells that do NOT contain the object	"Select cells without text"

Who uses grid image CAPTCHAs

Provider	Grid format	Key characteristics
Google reCAPTCHA v2	3×3 and 4×4	Dynamic tiles, behavioral analysis
BLS CAPTCHA	Variable (3-9 separate images)	Custom instructions, visa systems
hCaptcha	3×3 and 4×4	Similar to reCAPTCHA, privacy-focused
Custom implementations	Variable	Site-specific, no standardized API

How grid CAPTCHAs detect bots

Signal	What it reveals
Click accuracy	Bots click exact cell centers; humans are imprecise
Click timing	Bots click too fast or at perfectly regular intervals
Mouse trajectory	Bots move in straight lines; humans curve naturally
Selection correctness	ML models flag edge-case errors that humans make vs binary bot errors
Challenge completion time	Too fast = bot; too slow = bot giving up

Solving grid CAPTCHAs with CaptchaAI

For reCAPTCHA grids, use the token method for better reliability:

import requests, time

# Token method — CaptchaAI handles the grid internally
resp = requests.get("https://ocr.captchaai.com/in.php", params={
    "key": "YOUR_API_KEY",
    "method": "userrecaptcha",
    "googlekey": "SITE_KEY",
    "pageurl": "https://example.com",
    "json": 1
}).json()
task_id = resp["request"]

for _ in range(30):
    time.sleep(5)
    result = requests.get("https://ocr.captchaai.com/res.php", params={
        "key": "YOUR_API_KEY", "action": "get", "id": task_id, "json": 1
    }).json()
    if result.get("status") == 1:
        print(f"Token: {result['request'][:50]}...")
        break

For non-reCAPTCHA grids, use the image method:

import base64

with open("grid.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

resp = requests.post("https://ocr.captchaai.com/in.php", data={
    "key": "YOUR_API_KEY",
    "method": "post",
    "body": img_b64,
    "recaptcha": 1,
    "json": 1
}).json()

FAQ

What is the difference between grid CAPTCHA and image CAPTCHA?

Grid CAPTCHA divides an image into cells for selection. Image CAPTCHA (OCR) shows distorted text that the user types. Grid challenges require object recognition; image CAPTCHAs require text recognition.

Can AI solve grid CAPTCHAs without human help?

Modern image classification models can identify objects in grid cells, but success rates vary. CaptchaAI combines AI models with human verification for high accuracy.

Why do some grid CAPTCHAs show new images after clicking?

Dynamic grids (used by reCAPTCHA) replace clicked tiles to prevent screenshot-based solving and to require sustained attention. This increases the difficulty for automated systems.