Building a pharmacy price comparison tool means collecting pricing data from multiple pharmacy portals, drug pricing databases, and insurance formularies. Many of these sources protect their pricing pages with CAPTCHAs — particularly reCAPTCHA v2 on search forms and results pages. Here's how to handle them.
Where CAPTCHAs Appear in Pharmacy Portals
| Page type | Common CAPTCHA | Trigger |
|---|---|---|
| Drug search form | reCAPTCHA v2 | Every search query |
| Price results page | Cloudflare Turnstile | Suspected automated access |
| Pharmacy locator | reCAPTCHA v2 | Location-based queries |
| Coupon lookup | Image CAPTCHA | Before showing discount codes |
| Insurance formulary | reCAPTCHA v2 | Login/search gates |
Basic Workflow
import requests
import time
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})
def solve_recaptcha(site_key, page_url):
resp = requests.post("https://ocr.captchaai.com/in.php", data={
"key": "YOUR_API_KEY",
"method": "userrecaptcha",
"googlekey": site_key,
"pageurl": page_url,
"json": 1
})
task_id = resp.json()["request"]
for _ in range(60):
time.sleep(3)
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": "YOUR_API_KEY",
"action": "get",
"id": task_id,
"json": 1
})
data = result.json()
if data["status"] == 1:
return data["request"]
raise TimeoutError("Solve timed out")
def search_drug_prices(drug_name, zipcode):
search_url = "https://pharmacy.example.com/search"
# Load the search page to get the site key
page = session.get(search_url)
# Extract reCAPTCHA site key
import re
match = re.search(r'data-sitekey="([^"]+)"', page.text)
site_key = match.group(1)
# Solve the CAPTCHA
token = solve_recaptcha(site_key, search_url)
# Submit the search with token
results = session.post(search_url, data={
"drug": drug_name,
"zip": zipcode,
"g-recaptcha-response": token
})
return parse_prices(results.text)
Multi-Pharmacy Comparison
Compare prices across multiple sources:
PHARMACY_SOURCES = [
{
"name": "PharmacyA",
"url": "https://pharmacya.example.com/pricing",
"captcha_type": "recaptcha_v2"
},
{
"name": "PharmacyB",
"url": "https://pharmacyb.example.com/drugs",
"captcha_type": "turnstile"
},
{
"name": "PharmacyC",
"url": "https://pharmacyc.example.com/search",
"captcha_type": "image"
}
]
def compare_drug_price(drug_name, zipcode):
results = []
for source in PHARMACY_SOURCES:
try:
prices = fetch_prices(
source["url"],
source["captcha_type"],
drug_name,
zipcode
)
results.append({
"source": source["name"],
"prices": prices,
"status": "success"
})
except Exception as e:
results.append({
"source": source["name"],
"error": str(e),
"status": "failed"
})
return sorted(results, key=lambda r: r.get("prices", {}).get("lowest", float("inf")))
Handling Different CAPTCHA Types
Pharmacy portals use various CAPTCHA providers:
def fetch_prices(url, captcha_type, drug_name, zipcode):
page = session.get(url)
if captcha_type == "recaptcha_v2":
import re
match = re.search(r'data-sitekey="([^"]+)"', page.text)
token = solve_recaptcha(match.group(1), url)
field_name = "g-recaptcha-response"
elif captcha_type == "turnstile":
import re
match = re.search(r'data-sitekey="(0x[^"]+)"', page.text)
token = solve_turnstile(match.group(1), url)
field_name = "cf-turnstile-response"
elif captcha_type == "image":
img_data = extract_captcha_image(page.text)
token = solve_image_captcha(img_data)
field_name = "captcha"
return session.post(url, data={
"drug": drug_name,
"zip": zipcode,
field_name: token
})
Session Management for Repeated Searches
Drug price comparison requires multiple searches. Manage sessions to minimize CAPTCHA encounters:
class PharmacySession:
def __init__(self, base_url):
self.session = requests.Session()
self.base_url = base_url
self.searches_since_captcha = 0
def search(self, drug_name, zipcode):
result = self.session.post(f"{self.base_url}/search", data={
"drug": drug_name,
"zip": zipcode
})
if self.is_captcha_page(result.text):
token = self.solve_page_captcha(result.text)
result = self.session.post(f"{self.base_url}/search", data={
"drug": drug_name,
"zip": zipcode,
"g-recaptcha-response": token
})
self.searches_since_captcha = 0
else:
self.searches_since_captcha += 1
return result
def is_captcha_page(self, html):
return "g-recaptcha" in html or "cf-turnstile" in html
def solve_page_captcha(self, html):
import re
match = re.search(r'data-sitekey="([^"]+)"', html)
return solve_recaptcha(match.group(1), self.base_url)
JavaScript Implementation
async function comparePharmacyPrices(drugName, zipCode, sources) {
const results = await Promise.all(
sources.map(async (source) => {
try {
const price = await fetchDrugPrice(source, drugName, zipCode);
return { source: source.name, price, status: 'success' };
} catch (error) {
return { source: source.name, error: error.message, status: 'failed' };
}
})
);
return results
.filter(r => r.status === 'success')
.sort((a, b) => a.price - b.price);
}
async function fetchDrugPrice(source, drugName, zipCode) {
// Solve CAPTCHA for this source
const token = await solveCaptcha(source.siteKey, source.url);
const response = await fetch(source.url, {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: new URLSearchParams({
drug: drugName,
zip: zipCode,
'g-recaptcha-response': token
})
});
return parsePrice(await response.text());
}
Rate Limiting Considerations
Pharmacy portals monitor access patterns closely:
| Strategy | Implementation |
|---|---|
| Space requests | 5–10 seconds between searches |
| Rotate sessions | New session every 20–30 queries |
| Vary search patterns | Don't search the same drug repeatedly |
| Use residential proxies | Datacenter IPs trigger more CAPTCHAs |
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| CAPTCHA on every search | Session not persisting cookies | Use requests.Session() and maintain cookies |
| Prices not loading after CAPTCHA | Missing CSRF token | Extract and include hidden form fields |
| Site blocks after many searches | Rate limiting | Add delays between requests |
| Different prices than browser | Missing session cookies or headers | Match browser headers exactly |
FAQ
How many pharmacy sources can I compare simultaneously?
CaptchaAI handles parallel solving. You can solve CAPTCHAs for multiple pharmacy portals concurrently. The practical limit depends on your CaptchaAI balance and each portal's rate limits.
Do pharmacy portals change their CAPTCHA types frequently?
Some portals update their CAPTCHA implementation periodically. Monitor for changes by checking if your extraction still finds the site key. CaptchaAI supports all major CAPTCHA types, so switching providers doesn't require changing your solving approach.
Is there a difference between brand-name and generic drug pricing portals?
The CAPTCHA handling is similar. Generic drug databases may use simpler Image CAPTCHAs, while brand-name pharmacy portals tend to use reCAPTCHA v2 or Turnstile.
Related Articles
- How To Solve Recaptcha V2 Callback Using Api
- Recaptcha V2 Turnstile Same Site Handling
- Recaptcha V2 Callback Mechanism
Next Steps
Build your pharmacy price comparison tool — get your CaptchaAI API key and handle CAPTCHAs across all pharmacy sources.
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.