A production-grade web scraping stack has four components: compute, proxies, CAPTCHA solving, and storage. Many developers assume this requires hundreds of dollars monthly. It doesn't.
Here's a complete stack that handles moderate scraping workloads for under $100/month, with CaptchaAI as the CAPTCHA layer.
The $100/month stack breakdown
| Component | Role | Option | Monthly cost |
|---|---|---|---|
| Compute | Run scraper code | VPS (2 vCPU, 4GB RAM) | $6–$12 |
| Proxies | IP rotation, anti-blocking | Residential datacenter plan | $15–$40 |
| CAPTCHA solving | Handle CAPTCHA challenges | CaptchaAI BASIC | $15 |
| Storage | Store scraped data | Cloud object storage | $2–$5 |
| Total | $38–$72/month |
With room to spare under $100, and options to scale each component independently.
Component 1: Compute ($6–$12/month)
A modest VPS handles most scraping workloads for solo developers and small teams:
- 2 vCPU, 4GB RAM — sufficient for 3–5 concurrent Playwright/Puppeteer sessions
- 20GB SSD — logs, temp files, data before upload to storage
- Providers: DigitalOcean Droplet ($12), Hetzner CX22 (€4.85), Vultr regular ($6)
For pure Python + requests (no headless browser), even a 1 vCPU/2GB VPS ($4–6/month) handles 10+ concurrent scrapers comfortably.
Component 2: Proxies ($15–$40/month)
Proxies prevent IP bans on target sites. For most scraping:
- Datacenter proxies: Fast, cheap, ~$15–$25/month for shared rotating pools
- Residential proxies: Higher quality, ~$30–$50/month entry level
- If target sites are not bot-sensitive, datacenter proxies at $15/month work well
Many scraping tasks don't require residential proxies. Start with datacenter and upgrade only if you see blocking.
Component 3: CaptchaAI ($15/month)
The BASIC plan covers all CAPTCHA types and unlimited solves for $15/month. For a single scraper or monitoring tool, 5 threads is more than adequate.
import requests
import time
import os
API_KEY = os.environ["CAPTCHAAI_API_KEY"] # Store in env, not in code
class CaptchaSolver:
"""CaptchaAI solver for scraping pipelines."""
SUBMIT_URL = "https://ocr.captchaai.com/in.php"
POLL_URL = "https://ocr.captchaai.com/res.php"
def __init__(self, api_key: str):
self.api_key = api_key
def solve_recaptcha_v2(self, site_key: str, page_url: str) -> str:
task_id = self._submit({
"method": "userrecaptcha",
"googlekey": site_key,
"pageurl": page_url,
})
return self._poll(task_id)
def solve_turnstile(self, site_key: str, page_url: str) -> str:
task_id = self._submit({
"method": "turnstile",
"sitekey": site_key,
"pageurl": page_url,
})
return self._poll(task_id)
def _submit(self, params: dict) -> str:
resp = requests.post(self.SUBMIT_URL, data={
"key": self.api_key,
"json": 1,
**params,
})
result = resp.json()
if result["status"] != 1:
raise Exception(f"Submit error: {result['request']}")
return result["request"]
def _poll(self, task_id: str, timeout: int = 120) -> str:
for _ in range(timeout // 5):
time.sleep(5)
res = requests.get(self.POLL_URL, params={
"key": self.api_key,
"action": "get",
"id": task_id,
"json": 1,
}).json()
if res["status"] == 1:
return res["request"]
if res["request"] != "CAPCHA_NOT_READY":
raise Exception(f"Solve error: {res['request']}")
raise Exception(f"Timeout polling task {task_id}")
solver = CaptchaSolver(API_KEY)
Component 4: Storage ($2–$5/month)
For scraped data:
- AWS S3 / R2 (Cloudflare): Object storage, pay for what you use. Cloudflare R2 has no egress fees — $0.015/GB/month for storage, free transfers
- PostgreSQL (managed): $5–$15/month for small hosted database instances (Supabase free tier, PlanetScale, Neon)
- SQLite on VPS: Free — works for single-node scrapers writing locally, syncing to S3 periodically
For most budget stacks, R2 or similar object storage for raw data + SQLite on the VPS is the right call at this cost tier.
Full working example: price monitor with CAPTCHA handling
import requests
import time
import json
from datetime import datetime
API_KEY = "YOUR_API_KEY"
solver = CaptchaSolver(API_KEY) # From above
def scrape_protected_product(url: str, site_key: str) -> dict:
"""Scrape a product page protected by reCAPTCHA v2."""
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})
# Get initial page
resp = session.get(url)
# Check if CAPTCHA is present (simplified detection)
if "g-recaptcha" in resp.text:
print(f" CAPTCHA detected at {url} — solving...")
token = solver.solve_recaptcha_v2(site_key, url)
# Submit form with token
resp = session.post(url, data={
"g-recaptcha-response": token,
})
# Parse response (simplified)
return {
"url": url,
"status": resp.status_code,
"timestamp": datetime.now().isoformat(),
"content_length": len(resp.text),
}
# Monitor multiple products
products = [
{"url": "https://example.com/product/1", "site_key": "6Le-wvkS..."},
{"url": "https://example.com/product/2", "site_key": "6Le-wvkS..."},
]
results = []
for product in products:
try:
data = scrape_protected_product(product["url"], product["site_key"])
results.append(data)
print(f" Scraped: {product['url']} ({data['content_length']} bytes)")
except Exception as e:
print(f" Failed: {product['url']} — {e}")
# Save to local JSON (sync to R2/S3 in production)
with open("results.json", "w") as f:
json.dump(results, f, indent=2)
Stack cost at different scales
| Scale | Compute | Proxies | CaptchaAI | Storage | Total |
|---|---|---|---|---|---|
| Solo dev, light use | $6 | $15 | $15 | $2 | $38 |
| Small team, moderate | $12 | $30 | $30 (STANDARD) | $5 | $77 |
| Production, multiple clients | $24 | $50 | $90 (ADVANCE) | $10 | $174 |
Even at the production level with 50 CAPTCHA threads, total infrastructure costs under $175/month.
When to upgrade each component
Upgrade the component that's actually the bottleneck:
| Symptom | Bottleneck | Upgrade |
|---|---|---|
ERROR_NO_SLOT_AVAILABLE |
CaptchaAI threads | Next plan tier |
| IP bans, 403 errors | Proxy quality | Residential proxies |
| Memory errors, slow parsing | Compute | Larger VPS |
| Storage full | Storage | Larger plan or tiered archival |
Don't upgrade everything at once. Monitor which component constrains throughput.
FAQ
Is $15/month enough for CAPTCHA solving if I run scrapers 24/7? With 5 threads running 24/7 on Cloudflare Turnstile (7s average), you'd solve ~360,000 Turnstile challenges per month for $15. For most solo developers, that's sufficient.
Can CaptchaAI handle multiple CAPTCHA types in the same workflow?
Yes. One API key handles all supported types. Your code just changes the method parameter per CAPTCHA type.
What's the minimum viable stack for testing? CaptchaAI trial + Python + your laptop. Add compute and proxies when you move to production.
Start building
CaptchaAI at $15/month is the most cost-effective CAPTCHA layer for budget scraping stacks. Sign up at captchaai.com and have your first CAPTCHA solved today.
Full Working Code
Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.
View on GitHub →
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.