Price monitoring, stock checks, compliance audits — many workflows need to solve CAPTCHAs on a schedule. Running a script manually every morning isn't sustainable. Cron jobs and task schedulers handle the timing so you can focus on what happens with the data.
When to Schedule CAPTCHA Tasks
| Use Case | Frequency | Best Scheduler |
|---|---|---|
| Price monitoring | Every 15–60 min | Cron / systemd timer |
| Daily report generation | Once/day | Cron / Task Scheduler |
| Availability checks | Every 5 min | In-process scheduler (APScheduler / node-cron) |
| Weekly compliance audits | Weekly | Cron / cloud scheduler |
| Event-driven + periodic | Variable | APScheduler with cron trigger |
Python: APScheduler with CaptchaAI
APScheduler runs inside your Python process — no system-level cron configuration required.
import requests
import time
import json
import logging
from datetime import datetime
from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.cron import CronTrigger
API_KEY = "YOUR_API_KEY"
SUBMIT_URL = "https://ocr.captchaai.com/in.php"
RESULT_URL = "https://ocr.captchaai.com/res.php"
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
log = logging.getLogger("captcha-scheduler")
def solve_captcha(sitekey, pageurl, method="userrecaptcha"):
"""Submit and poll a single CAPTCHA task."""
params = {
"key": API_KEY,
"method": method,
"json": 1,
}
if method == "userrecaptcha":
params["googlekey"] = sitekey
params["pageurl"] = pageurl
elif method == "turnstile":
params["sitekey"] = sitekey
params["pageurl"] = pageurl
resp = requests.post(SUBMIT_URL, data=params, timeout=30).json()
if resp.get("status") != 1:
raise RuntimeError(f"Submit failed: {resp.get('request')}")
task_id = resp["request"]
for _ in range(60):
time.sleep(5)
poll = requests.get(RESULT_URL, params={
"key": API_KEY, "action": "get",
"id": task_id, "json": 1,
}, timeout=15).json()
if poll.get("request") == "CAPCHA_NOT_READY":
continue
if poll.get("status") == 1:
return poll["request"]
raise RuntimeError(f"Solve failed: {poll.get('request')}")
raise RuntimeError("Timeout waiting for solution")
def check_balance():
"""Verify sufficient balance before running tasks."""
resp = requests.get(RESULT_URL, params={
"key": API_KEY, "action": "getbalance", "json": 1,
}, timeout=10).json()
balance = float(resp.get("request", 0))
if balance < 0.50:
log.warning(f"Low balance: ${balance:.2f}")
return False
return True
def price_check_job():
"""Scheduled job: solve CAPTCHA, scrape price, save result."""
log.info("Starting price check job")
if not check_balance():
log.error("Insufficient balance — skipping run")
return
targets = [
{"sitekey": "SITE_KEY_1", "pageurl": "https://example.com/product-a"},
{"sitekey": "SITE_KEY_2", "pageurl": "https://example.com/product-b"},
]
results = []
for target in targets:
try:
token = solve_captcha(target["sitekey"], target["pageurl"])
log.info(f"Solved CAPTCHA for {target['pageurl']}")
# Use token to access the page
# response = requests.get(target["pageurl"], headers={"captcha-token": token})
# price = parse_price(response.text)
results.append({
"url": target["pageurl"],
"token_preview": token[:30] + "...",
"timestamp": datetime.utcnow().isoformat(),
"status": "success",
})
except Exception as e:
log.error(f"Failed for {target['pageurl']}: {e}")
results.append({
"url": target["pageurl"],
"error": str(e),
"timestamp": datetime.utcnow().isoformat(),
"status": "failed",
})
# Save results
output_file = f"results_{datetime.utcnow().strftime('%Y%m%d_%H%M%S')}.json"
with open(output_file, "w") as f:
json.dump(results, f, indent=2)
log.info(f"Saved {len(results)} results to {output_file}")
# Schedule the job
scheduler = BlockingScheduler()
# Every 30 minutes during business hours
scheduler.add_job(
price_check_job,
CronTrigger(minute="0,30", hour="8-20", day_of_week="mon-fri"),
id="price_check",
max_instances=1, # Prevent overlap
misfire_grace_time=300, # 5 min grace period
)
# Daily balance report at 7 AM
scheduler.add_job(
check_balance,
CronTrigger(hour=7, minute=0),
id="balance_check",
)
log.info("Scheduler started")
scheduler.start()
Install dependencies:
pip install apscheduler requests
JavaScript: node-cron with CaptchaAI
const cron = require("node-cron");
const fs = require("fs");
const API_KEY = "YOUR_API_KEY";
const SUBMIT_URL = "https://ocr.captchaai.com/in.php";
const RESULT_URL = "https://ocr.captchaai.com/res.php";
async function solveCaptcha(sitekey, pageurl) {
const params = new URLSearchParams({
key: API_KEY,
method: "userrecaptcha",
googlekey: sitekey,
pageurl,
json: "1",
});
const submit = await (await fetch(SUBMIT_URL, { method: "POST", body: params })).json();
if (submit.status !== 1) throw new Error(`Submit: ${submit.request}`);
const taskId = submit.request;
for (let i = 0; i < 60; i++) {
await new Promise((r) => setTimeout(r, 5000));
const url = `${RESULT_URL}?key=${API_KEY}&action=get&id=${taskId}&json=1`;
const poll = await (await fetch(url)).json();
if (poll.request === "CAPCHA_NOT_READY") continue;
if (poll.status === 1) return poll.request;
throw new Error(`Solve: ${poll.request}`);
}
throw new Error("Timeout");
}
async function scheduledJob() {
console.log(`[${new Date().toISOString()}] Running scheduled CAPTCHA job`);
const targets = [
{ sitekey: "SITE_KEY_1", pageurl: "https://example.com/page-1" },
{ sitekey: "SITE_KEY_2", pageurl: "https://example.com/page-2" },
];
const results = [];
for (const target of targets) {
try {
const token = await solveCaptcha(target.sitekey, target.pageurl);
console.log(` Solved: ${target.pageurl}`);
results.push({ url: target.pageurl, status: "solved", timestamp: new Date().toISOString() });
} catch (err) {
console.error(` Failed: ${target.pageurl} — ${err.message}`);
results.push({ url: target.pageurl, status: "failed", error: err.message, timestamp: new Date().toISOString() });
}
}
const filename = `results_${Date.now()}.json`;
fs.writeFileSync(filename, JSON.stringify(results, null, 2));
console.log(` Saved to ${filename}`);
}
// Every 30 minutes
cron.schedule("*/30 * * * *", scheduledJob);
// Daily at 6 AM — balance check
cron.schedule("0 6 * * *", async () => {
const url = `${RESULT_URL}?key=${API_KEY}&action=getbalance&json=1`;
const resp = await (await fetch(url)).json();
const balance = parseFloat(resp.request);
console.log(`[Balance] $${balance.toFixed(2)}`);
if (balance < 1.0) console.warn("[Balance] LOW — refill recommended");
});
console.log("Scheduler running...");
Install dependencies:
npm install node-cron
System Cron (Linux/macOS)
For scripts that run independently — no long-running process required:
# Edit crontab
crontab -e
# Every 30 minutes
*/30 * * * * cd /opt/captcha-jobs && /usr/bin/python3 price_check.py >> /var/log/captcha-jobs.log 2>&1
# Daily at 7 AM
0 7 * * * cd /opt/captcha-jobs && /usr/bin/python3 daily_report.py >> /var/log/captcha-report.log 2>&1
# Every Monday at 9 AM
0 9 * * 1 cd /opt/captcha-jobs && /usr/bin/python3 weekly_audit.py >> /var/log/captcha-audit.log 2>&1
Preventing Overlap
When a job takes longer than the schedule interval:
| Approach | Implementation |
|---|---|
| Lock file | Check for /tmp/captcha-job.lock before starting; delete on exit |
APScheduler max_instances=1 |
Built-in — blocks concurrent runs |
| node-cron flag | Set isRunning = true at start, check before executing |
| Flock (Linux) | flock -n /tmp/job.lock python3 script.py |
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| Job runs but no output | Script errors silently | Redirect stderr to log: 2>&1; add logging |
| Overlapping runs | Job takes longer than interval | Use lock file or max_instances=1 |
| Cron job not firing | Wrong crontab syntax or wrong user | Use crontab -l to verify; check system cron logs |
| Balance depleted overnight | Too many scheduled runs | Add balance check before processing; set up alerts |
| Missed runs on restart | Process-based scheduler dies | Use systemd service with Restart=always for APScheduler |
FAQ
Should I use system cron or an in-process scheduler?
System cron for simple, standalone scripts that run and exit. In-process schedulers (APScheduler, node-cron) for applications that need shared state, database connections, or complex scheduling logic.
How do I handle timezone mismatches?
APScheduler accepts a timezone parameter on triggers. System cron uses the server's timezone — set TZ environment variable in crontab if needed. Always log timestamps in UTC.
What if a scheduled run fails repeatedly?
Add a failure counter. After 3–5 consecutive failures, send an alert (email, Slack webhook) and optionally pause the schedule until the issue is investigated.
Next Steps
Automate your recurring CAPTCHA workflows with CaptchaAI — get your API key and set up your first scheduled job.
Related guides:
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.