Anti-bot defences: Cloudflare, DataDome, Akamai explained
What each of the major anti-bot platforms actually does, the signals they read, and which one is hardest to defeat in 2026.
The four big anti-bot platforms in 2026 are Cloudflare Bot Management, DataDome, Akamai Bot Manager, and Kasada. They all do roughly the same thing — fingerprint your browser, check it against known-bot profiles, and either let you through, challenge you, or block you outright. Where they differ is in what signals they weight most heavily and how aggressive their defaults are. Cloudflare is the most permissive and the most common. Kasada is the most aggressive and the hardest to defeat. DataDome and Akamai sit between, with DataDome having the strongest behavioral analysis. If your scrape target uses any of these, success depends much more on understanding which than on which scraping library you picked.
We spend significant time on the anti-bot side because most scraping engagements at YAEL involve at least one protected target. This is the field guide.
Fingerprinting at a glance
Every anti-bot platform combines several signals. The signals matter in roughly this order:
- TLS fingerprint (JA3, JA4, JA4_R). What does your TLS handshake look like compared to a real Chrome's?
- HTTP/2 fingerprint (Akamai's H2 fingerprint). Same idea, at the HTTP/2 layer.
- Browser fingerprint (canvas, WebGL, audio, screen, fonts). What does the rendered browser look like?
- JavaScript challenge execution. Can your "browser" actually run the obfuscated JS the protection serves?
- Behavior (mouse movement, scroll patterns, keypress timing). Are you a human or a machine?
- IP reputation. Is the IP a known datacenter, residential, mobile, or proxy?
- Account / cookie signals. Have you been here before? Did you complete a challenge recently?
Most defenses are cumulative. Failing one signal gives you a soft challenge. Failing three gets you blocked.
Cloudflare Bot Management
The most common because Cloudflare is the most common CDN. Cloudflare's signal stack:
- TLS fingerprint via their own classifier (BIC — Bot in Code)
- A turnstile-style JS challenge that runs invisibly
- IP reputation via Cloudflare's threat intelligence
- Behavior tracking via the Cloudflare Web Analytics signal
Cloudflare's defaults are surprisingly permissive. A stock Playwright with a real-looking user-agent, residential proxy, and the stealth plugin gets through Cloudflare's medium-strength setting on most sites. Their "I'm Under Attack" mode is stricter and triggers Turnstile, which requires either solving it interactively or using a service like 2Captcha for Turnstile.
// What works for Cloudflare medium setting
const browser = await chromium.launch({
proxy: residentialProxy,
headless: true,
});
const ctx = await browser.newContext({
userAgent: realChromeUA,
viewport: { width: 1440, height: 900 },
locale: "en-US",
timezoneId: "America/New_York",
});
const page = await ctx.newPage();
// Stealth plugin handles navigator.webdriver etc
await page.goto(target, { waitUntil: "networkidle" });The detail that catches teams: Cloudflare's challenge uses non-deterministic delays. A scraper that fires the next request 50ms after the previous one looks robotic. Add 1-3 second jitters between requests on the same domain.
DataDome
DataDome's reputation is "very behavioral." They are. Their signal stack:
- TLS + HTTP/2 fingerprint, weighted heavily
- Canvas / WebGL fingerprint, with strict consistency checks
- Behavioral mouse-movement tracking
- A more complex JS challenge than Cloudflare's
- Captcha solving challenge (their own) on suspicion
DataDome is harder than Cloudflare. Stock Playwright fails. Stealth plugin + residential proxy gets through the simple cases. Their "high" setting requires real browser fingerprints, which is why anti-detect browser farms exist.
A common DataDome failure: the sec-ch-ua client hints don't match the user-agent. DataDome compares them. If your user-agent says Chrome 130 but sec-ch-ua says Chrome 124, you're cooked.
// Override client hints to match the user-agent
await ctx.setExtraHTTPHeaders({
"sec-ch-ua": '"Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"macOS"',
});Akamai Bot Manager
Akamai is older and used by big enterprises (banks, airlines, Fortune 500). Their signal stack:
- Their proprietary
_abckcookie challenge (a JS-generated token that proves you ran their script) - HTTP/2 fingerprint comparison
- TLS fingerprint
- IP reputation via their honeypot network
Akamai's _abck cookie is the famous gotcha. Get the cookie wrong (or don't have it) and every subsequent request returns a soft block — 200 OK with content that's secretly an error. Catch it by checking the cookie's structure on each request and re-solving the challenge if it expires.
There are open-source reimplementations of Akamai's challenge solver. None are particularly reliable. For Akamai targets, paid services (NetNut, Bright Data Web Unlocker) tend to be more economic than building your own.
Kasada
The newest of the four and the most aggressive. Kasada's pitch is that they detect bots that defeat the others. They do. Their stack:
- Polymorphic JS challenge that's different every page load
- Extremely strict TLS / HTTP/2 fingerprinting
- Behavioral analysis that flags absent mouse movement
- Active honeypots in the page DOM
Kasada is the only one of the four where stealth-plus-residential reliably fails. To scrape a Kasada-protected target, you typically need either a real-browser-farm service (with anti-detect profiles), a paid Kasada-specific solver, or the bypass-by-API trick (find the underlying JSON API the site calls and skip the protected HTML page entirely).
The TLS fingerprint problem
TLS fingerprinting is the most underappreciated signal. Even with a perfect-looking browser, your TLS handshake leaks information.
curl and Node's http module use OpenSSL's TLS, which produces a fingerprint very different from Chrome's. A real headless Chromium produces the right fingerprint because it uses BoringSSL the same way real Chrome does. This is why Playwright and Puppeteer work where fetch doesn't.
If you're scraping via HTTP libraries (not headless), you need a TLS-matching client like curl_cffi (Python) or node-tls-impersonate. These reproduce Chrome's exact TLS handshake.
# Python with curl_cffi
from curl_cffi import requests
r = requests.get("https://example.com", impersonate="chrome120")IP reputation — the lever you can pull
The single biggest controllable factor. Anti-bot platforms maintain block lists of:
- AWS, GCP, Azure, DigitalOcean, OVH IP ranges (90% of scrape traffic comes from these)
- Known commercial proxy IP ranges
- Tor exit nodes
- IPs flagged by ML models from prior bot traffic
Your options, ranked by cost:
- Your own datacenter IPs — free, but increasingly useless. Cloudflare blocks them outright on most sites.
- Datacenter proxies — cheap (~$0.50/GB), often blocked.
- Residential proxies — moderate ($5-15/GB), high success rate.
- Mobile proxies — expensive ($15-40/GB), almost never blocked.
- ISP proxies (datacenter IPs that look like residential to anti-bot platforms) — emerging middle option.
The right tier depends on the target. For low-protection sites, datacenter is fine. For Cloudflare + DataDome targets, residential is the floor.
Detection signals you control
A short list of things to fix on your scraper before paying for fancier proxies:
navigator.webdriver === true→ fix with stealth plugin- Missing
chromeobject → fix with stealth plugin - Wrong window size (1024x768 default is suspicious — use 1440x900)
- Missing or wrong client hints
- Wrong language / timezone combinations
- Instantaneous "human" actions (set realistic timings)
- Missing
Accept,Accept-Encoding,Accept-Languageheaders - Identical request fingerprint on every page (vary timings, vary scroll depth)
If you fail any of these, no amount of residential proxy money saves you.
The bypass-by-API trick
The cheapest scrape is the one that doesn't render HTML. Many sites that look anti-bot-protected at the HTML layer have a JSON API that's barely protected. Open Network tab, find the XHR, hit it directly.
// Often the API endpoint is open with a session cookie
const res = await fetch("https://example.com/api/products?page=1", {
headers: {
"x-csrf-token": csrfTokenFromCookie,
"user-agent": realChromeUA,
accept: "application/json",
},
});If you can do this, do this. It's 100x faster than headless and often easier to keep working.
What we recommend by protection level
| Protection | Approach |
|---|---|
| None / robots.txt only | fetch + cheerio |
| Basic JS rendering needed | Playwright stock, datacenter proxies |
| Cloudflare bot detection on | Playwright + stealth + residential |
| DataDome on | Playwright + stealth + residential + matching client hints |
| Akamai on | Paid bypass service (Bright Data Web Unlocker) |
| Kasada on | Anti-detect browser farm (Browserless, Browserbase) or paid solver |
| Captcha gates | 2Captcha or human-in-the-loop |
Need to scrape a protected target?
We've shipped scraping infrastructure against Cloudflare, DataDome, Akamai, and Kasada — and we know which battles are worth picking.
FAQ
Is bypassing anti-bot protection legal?
Depends on jurisdiction and what's behind the wall. Public data with a CDN-level rate limit is usually fine. Data behind a login or paywall — much riskier. Read CFAA jurisprudence (US) and the relevant equivalents in your jurisdiction. We are not your lawyer.
Which is the hardest to bypass?
Kasada in 2026, narrowly. Akamai for sites that have full Bot Manager Premier turned on. Both require paid services for most teams.
Why does Cloudflare let me through some days and block me others?
Cloudflare's bot score is non-deterministic — it shifts as their ML model retrains and as your IP's reputation drifts. A scraper that worked yesterday breaking today is a Tuesday at Cloudflare.
Should I rotate user agents?
Slowly. Rotating UA per request looks bot-like. Rotating once per session (one UA per "user") is more natural. Always make sure UA and client hints match.
What's the deal with TLS impersonation libraries?
They make HTTP libraries (not browsers) produce a TLS fingerprint that matches a real Chrome. Defeats TLS-fingerprint-based detection. Doesn't help with JS challenges.
Can I use my home IP?
For occasional small scrapes, yes. For volume, no — your ISP doesn't like it and you'll get rate-limited by upstream first.
What about reCAPTCHA v3?
It's an invisible behavioral signal that returns a score between 0 (bot) and 1 (human). Sites use the score to decide whether to challenge. Defeating v3 specifically means looking like a normal user across multiple sessions — proxies help, but behavior helps more.
How do anti-bot platforms react to LLM-driven browsers?
They detect them by the same signals: TLS fingerprint, browser fingerprint, behavior. A "Claude-powered" browser still has to render with Chromium or similar, so the same defenses apply. The interesting frontier is whether platforms develop LLM-specific signals; not seeing it widely yet.