Anti-bot defences: Cloudflare, DataDome, Akamai explained

What each of the major anti-bot platforms actually does, the signals they read, and which one is hardest to defeat in 2026.

YAEL Engineering05 Feb 20269 min read1,732 words

The four big anti-bot platforms in 2026 are Cloudflare Bot Management, DataDome, Akamai Bot Manager, and Kasada. They all do roughly the same thing — fingerprint your browser, check it against known-bot profiles, and either let you through, challenge you, or block you outright. Where they differ is in what signals they weight most heavily and how aggressive their defaults are. Cloudflare is the most permissive and the most common. Kasada is the most aggressive and the hardest to defeat. DataDome and Akamai sit between, with DataDome having the strongest behavioral analysis. If your scrape target uses any of these, success depends much more on understanding which than on which scraping library you picked.

We spend significant time on the anti-bot side because most scraping engagements at YAEL involve at least one protected target. This is the field guide.

Fingerprinting at a glance

Every anti-bot platform combines several signals. The signals matter in roughly this order:

TLS fingerprint (JA3, JA4, JA4_R). What does your TLS handshake look like compared to a real Chrome's?
HTTP/2 fingerprint (Akamai's H2 fingerprint). Same idea, at the HTTP/2 layer.
Browser fingerprint (canvas, WebGL, audio, screen, fonts). What does the rendered browser look like?
JavaScript challenge execution. Can your "browser" actually run the obfuscated JS the protection serves?
Behavior (mouse movement, scroll patterns, keypress timing). Are you a human or a machine?
IP reputation. Is the IP a known datacenter, residential, mobile, or proxy?
Account / cookie signals. Have you been here before? Did you complete a challenge recently?

Most defenses are cumulative. Failing one signal gives you a soft challenge. Failing three gets you blocked.

Cloudflare Bot Management

The most common because Cloudflare is the most common CDN. Cloudflare's signal stack:

TLS fingerprint via their own classifier (BIC — Bot in Code)
A turnstile-style JS challenge that runs invisibly
IP reputation via Cloudflare's threat intelligence
Behavior tracking via the Cloudflare Web Analytics signal

Cloudflare's defaults are surprisingly permissive. A stock Playwright with a real-looking user-agent, residential proxy, and the stealth plugin gets through Cloudflare's medium-strength setting on most sites. Their "I'm Under Attack" mode is stricter and triggers Turnstile, which requires either solving it interactively or using a service like 2Captcha for Turnstile.

// What works for Cloudflare medium setting
const browser = await chromium.launch({
  proxy: residentialProxy,
  headless: true,
});
const ctx = await browser.newContext({
  userAgent: realChromeUA,
  viewport: { width: 1440, height: 900 },
  locale: "en-US",
  timezoneId: "America/New_York",
});
const page = await ctx.newPage();
// Stealth plugin handles navigator.webdriver etc
await page.goto(target, { waitUntil: "networkidle" });

The detail that catches teams: Cloudflare's challenge uses non-deterministic delays. A scraper that fires the next request 50ms after the previous one looks robotic. Add 1-3 second jitters between requests on the same domain.

DataDome

DataDome's reputation is "very behavioral." They are. Their signal stack:

TLS + HTTP/2 fingerprint, weighted heavily
Canvas / WebGL fingerprint, with strict consistency checks
Behavioral mouse-movement tracking
A more complex JS challenge than Cloudflare's
Captcha solving challenge (their own) on suspicion

DataDome is harder than Cloudflare. Stock Playwright fails. Stealth plugin + residential proxy gets through the simple cases. Their "high" setting requires real browser fingerprints, which is why anti-detect browser farms exist.

A common DataDome failure: the sec-ch-ua client hints don't match the user-agent. DataDome compares them. If your user-agent says Chrome 130 but sec-ch-ua says Chrome 124, you're cooked.

// Override client hints to match the user-agent
await ctx.setExtraHTTPHeaders({
  "sec-ch-ua": '"Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"',
  "sec-ch-ua-mobile": "?0",
  "sec-ch-ua-platform": '"macOS"',
});

Akamai Bot Manager

Akamai is older and used by big enterprises (banks, airlines, Fortune 500). Their signal stack:

Their proprietary _abck cookie challenge (a JS-generated token that proves you ran their script)
HTTP/2 fingerprint comparison
TLS fingerprint
IP reputation via their honeypot network

Akamai's _abck cookie is the famous gotcha. Get the cookie wrong (or don't have it) and every subsequent request returns a soft block — 200 OK with content that's secretly an error. Catch it by checking the cookie's structure on each request and re-solving the challenge if it expires.

There are open-source reimplementations of Akamai's challenge solver. None are particularly reliable. For Akamai targets, paid services (NetNut, Bright Data Web Unlocker) tend to be more economic than building your own.

Kasada

The newest of the four and the most aggressive. Kasada's pitch is that they detect bots that defeat the others. They do. Their stack:

Polymorphic JS challenge that's different every page load
Extremely strict TLS / HTTP/2 fingerprinting
Behavioral analysis that flags absent mouse movement
Active honeypots in the page DOM

Kasada is the only one of the four where stealth-plus-residential reliably fails. To scrape a Kasada-protected target, you typically need either a real-browser-farm service (with anti-detect profiles), a paid Kasada-specific solver, or the bypass-by-API trick (find the underlying JSON API the site calls and skip the protected HTML page entirely).

The TLS fingerprint problem

TLS fingerprinting is the most underappreciated signal. Even with a perfect-looking browser, your TLS handshake leaks information.

curl and Node's http module use OpenSSL's TLS, which produces a fingerprint very different from Chrome's. A real headless Chromium produces the right fingerprint because it uses BoringSSL the same way real Chrome does. This is why Playwright and Puppeteer work where fetch doesn't.

If you're scraping via HTTP libraries (not headless), you need a TLS-matching client like curl_cffi (Python) or node-tls-impersonate. These reproduce Chrome's exact TLS handshake.

python

# Python with curl_cffi
from curl_cffi import requests
r = requests.get("https://example.com", impersonate="chrome120")

IP reputation — the lever you can pull

The single biggest controllable factor. Anti-bot platforms maintain block lists of:

AWS, GCP, Azure, DigitalOcean, OVH IP ranges (90% of scrape traffic comes from these)
Known commercial proxy IP ranges
Tor exit nodes
IPs flagged by ML models from prior bot traffic

Your options, ranked by cost:

Your own datacenter IPs — free, but increasingly useless. Cloudflare blocks them outright on most sites.
Datacenter proxies — cheap (~$0.50/GB), often blocked.
Residential proxies — moderate ($5-15/GB), high success rate.
Mobile proxies — expensive ($15-40/GB), almost never blocked.
ISP proxies (datacenter IPs that look like residential to anti-bot platforms) — emerging middle option.

The right tier depends on the target. For low-protection sites, datacenter is fine. For Cloudflare + DataDome targets, residential is the floor.

Detection signals you control

A short list of things to fix on your scraper before paying for fancier proxies:

navigator.webdriver === true → fix with stealth plugin
Missing chrome object → fix with stealth plugin
Wrong window size (1024x768 default is suspicious — use 1440x900)
Missing or wrong client hints
Wrong language / timezone combinations
Instantaneous "human" actions (set realistic timings)
Missing Accept, Accept-Encoding, Accept-Language headers
Identical request fingerprint on every page (vary timings, vary scroll depth)

If you fail any of these, no amount of residential proxy money saves you.

The bypass-by-API trick

The cheapest scrape is the one that doesn't render HTML. Many sites that look anti-bot-protected at the HTML layer have a JSON API that's barely protected. Open Network tab, find the XHR, hit it directly.

// Often the API endpoint is open with a session cookie
const res = await fetch("https://example.com/api/products?page=1", {
  headers: {
    "x-csrf-token": csrfTokenFromCookie,
    "user-agent": realChromeUA,
    accept: "application/json",
  },
});

If you can do this, do this. It's 100x faster than headless and often easier to keep working.

| Protection | Approach | |---|---| | None / robots.txt only | fetch + cheerio | | Basic JS rendering needed | Playwright stock, datacenter proxies | | Cloudflare bot detection on | Playwright + stealth + residential | | DataDome on | Playwright + stealth + residential + matching client hints | | Akamai on | Paid bypass service (Bright Data Web Unlocker) | | Kasada on | Anti-detect browser farm (Browserless, Browserbase) or paid solver | | Captcha gates | 2Captcha or human-in-the-loop |

Need to scrape a protected target?

We've shipped scraping infrastructure against Cloudflare, DataDome, Akamai, and Kasada — and we know which battles are worth picking.

See scraping service

FAQ

Is bypassing anti-bot protection legal?

Depends on jurisdiction and what's behind the wall. Public data with a CDN-level rate limit is usually fine. Data behind a login or paywall — much riskier. Read CFAA jurisprudence (US) and the relevant equivalents in your jurisdiction. We are not your lawyer.

Which is the hardest to bypass?

Kasada in 2026, narrowly. Akamai for sites that have full Bot Manager Premier turned on. Both require paid services for most teams.

Why does Cloudflare let me through some days and block me others?

Cloudflare's bot score is non-deterministic — it shifts as their ML model retrains and as your IP's reputation drifts. A scraper that worked yesterday breaking today is a Tuesday at Cloudflare.

Should I rotate user agents?

Slowly. Rotating UA per request looks bot-like. Rotating once per session (one UA per "user") is more natural. Always make sure UA and client hints match.

What's the deal with TLS impersonation libraries?

They make HTTP libraries (not browsers) produce a TLS fingerprint that matches a real Chrome. Defeats TLS-fingerprint-based detection. Doesn't help with JS challenges.

Can I use my home IP?

For occasional small scrapes, yes. For volume, no — your ISP doesn't like it and you'll get rate-limited by upstream first.

What about reCAPTCHA v3?

It's an invisible behavioral signal that returns a score between 0 (bot) and 1 (human). Sites use the score to decide whether to challenge. Defeating v3 specifically means looking like a normal user across multiple sessions — proxies help, but behavior helps more.

How do anti-bot platforms react to LLM-driven browsers?

They detect them by the same signals: TLS fingerprint, browser fingerprint, behavior. A "Claude-powered" browser still has to render with Chromium or similar, so the same defenses apply. The interesting frontier is whether platforms develop LLM-specific signals; not seeing it widely yet.

TagsCloudflare DataDome Akamai Anti-bot Scraping

ServiceWeb Scraping Services

Keep reading

Scraping & AutomationHeadless browser scraping: Playwright vs Puppeteer in 2026An opinionated comparison — Playwright vs Puppeteer vs newer alternatives. When each one wins, the bot-detection gap, and what production scraping infra actually looks like.8 min read SaaSHow to build a SaaS MVP in 6 weeks (without a rewrite later)A six-week SaaS MVP plan that doesn't trade speed for technical debt — auth, billing, multi-tenancy, and a real operator dashboard from day one.10 min read PaymentsStripe Billing vs Paddle vs LemonSqueezy for SaaS in 2026An opinionated comparison of the three default billing platforms for B2B SaaS — pricing model coverage, MoR vs not, dev DX, and where each one breaks at scale.8 min read

Scraping & Automation

Anti-bot defences: Cloudflare, DataDome, Akamai explained

What each of the major anti-bot platforms actually does, the signals they read, and which one is hardest to defeat in 2026.

YAEL Engineering05 Feb 20269 min read1,732 words

We spend significant time on the anti-bot side because most scraping engagements at YAEL involve at least one protected target. This is the field guide.

Fingerprinting at a glance

Every anti-bot platform combines several signals. The signals matter in roughly this order:

TLS fingerprint (JA3, JA4, JA4_R). What does your TLS handshake look like compared to a real Chrome's?
HTTP/2 fingerprint (Akamai's H2 fingerprint). Same idea, at the HTTP/2 layer.
Browser fingerprint (canvas, WebGL, audio, screen, fonts). What does the rendered browser look like?
JavaScript challenge execution. Can your "browser" actually run the obfuscated JS the protection serves?
Behavior (mouse movement, scroll patterns, keypress timing). Are you a human or a machine?
IP reputation. Is the IP a known datacenter, residential, mobile, or proxy?
Account / cookie signals. Have you been here before? Did you complete a challenge recently?

Most defenses are cumulative. Failing one signal gives you a soft challenge. Failing three gets you blocked.

Cloudflare Bot Management

The most common because Cloudflare is the most common CDN. Cloudflare's signal stack:

TLS fingerprint via their own classifier (BIC — Bot in Code)
A turnstile-style JS challenge that runs invisibly
IP reputation via Cloudflare's threat intelligence
Behavior tracking via the Cloudflare Web Analytics signal

// What works for Cloudflare medium setting
const browser = await chromium.launch({
  proxy: residentialProxy,
  headless: true,
});
const ctx = await browser.newContext({
  userAgent: realChromeUA,
  viewport: { width: 1440, height: 900 },
  locale: "en-US",
  timezoneId: "America/New_York",
});
const page = await ctx.newPage();
// Stealth plugin handles navigator.webdriver etc
await page.goto(target, { waitUntil: "networkidle" });

DataDome

DataDome's reputation is "very behavioral." They are. Their signal stack:

TLS + HTTP/2 fingerprint, weighted heavily
Canvas / WebGL fingerprint, with strict consistency checks
Behavioral mouse-movement tracking
A more complex JS challenge than Cloudflare's
Captcha solving challenge (their own) on suspicion

A common DataDome failure: the sec-ch-ua client hints don't match the user-agent. DataDome compares them. If your user-agent says Chrome 130 but sec-ch-ua says Chrome 124, you're cooked.

// Override client hints to match the user-agent
await ctx.setExtraHTTPHeaders({
  "sec-ch-ua": '"Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"',
  "sec-ch-ua-mobile": "?0",
  "sec-ch-ua-platform": '"macOS"',
});

Akamai Bot Manager

Akamai is older and used by big enterprises (banks, airlines, Fortune 500). Their signal stack:

Their proprietary _abck cookie challenge (a JS-generated token that proves you ran their script)
HTTP/2 fingerprint comparison
TLS fingerprint
IP reputation via their honeypot network

Kasada

The newest of the four and the most aggressive. Kasada's pitch is that they detect bots that defeat the others. They do. Their stack:

Polymorphic JS challenge that's different every page load
Extremely strict TLS / HTTP/2 fingerprinting
Behavioral analysis that flags absent mouse movement
Active honeypots in the page DOM

The TLS fingerprint problem

TLS fingerprinting is the most underappreciated signal. Even with a perfect-looking browser, your TLS handshake leaks information.

If you're scraping via HTTP libraries (not headless), you need a TLS-matching client like curl_cffi (Python) or node-tls-impersonate. These reproduce Chrome's exact TLS handshake.

python

# Python with curl_cffi
from curl_cffi import requests
r = requests.get("https://example.com", impersonate="chrome120")

IP reputation — the lever you can pull

The single biggest controllable factor. Anti-bot platforms maintain block lists of:

AWS, GCP, Azure, DigitalOcean, OVH IP ranges (90% of scrape traffic comes from these)
Known commercial proxy IP ranges
Tor exit nodes
IPs flagged by ML models from prior bot traffic

Your options, ranked by cost:

Your own datacenter IPs — free, but increasingly useless. Cloudflare blocks them outright on most sites.
Datacenter proxies — cheap (~$0.50/GB), often blocked.
Residential proxies — moderate ($5-15/GB), high success rate.
Mobile proxies — expensive ($15-40/GB), almost never blocked.
ISP proxies (datacenter IPs that look like residential to anti-bot platforms) — emerging middle option.

The right tier depends on the target. For low-protection sites, datacenter is fine. For Cloudflare + DataDome targets, residential is the floor.

Detection signals you control

A short list of things to fix on your scraper before paying for fancier proxies:

navigator.webdriver === true → fix with stealth plugin
Missing chrome object → fix with stealth plugin
Wrong window size (1024x768 default is suspicious — use 1440x900)
Missing or wrong client hints
Wrong language / timezone combinations
Instantaneous "human" actions (set realistic timings)
Missing Accept, Accept-Encoding, Accept-Language headers
Identical request fingerprint on every page (vary timings, vary scroll depth)

If you fail any of these, no amount of residential proxy money saves you.

The bypass-by-API trick

// Often the API endpoint is open with a session cookie
const res = await fetch("https://example.com/api/products?page=1", {
  headers: {
    "x-csrf-token": csrfTokenFromCookie,
    "user-agent": realChromeUA,
    accept: "application/json",
  },
});

If you can do this, do this. It's 100x faster than headless and often easier to keep working.

Need to scrape a protected target?

We've shipped scraping infrastructure against Cloudflare, DataDome, Akamai, and Kasada — and we know which battles are worth picking.

See scraping service

FAQ

Is bypassing anti-bot protection legal?

Which is the hardest to bypass?

Kasada in 2026, narrowly. Akamai for sites that have full Bot Manager Premier turned on. Both require paid services for most teams.

Why does Cloudflare let me through some days and block me others?

Cloudflare's bot score is non-deterministic — it shifts as their ML model retrains and as your IP's reputation drifts. A scraper that worked yesterday breaking today is a Tuesday at Cloudflare.

Should I rotate user agents?

Slowly. Rotating UA per request looks bot-like. Rotating once per session (one UA per "user") is more natural. Always make sure UA and client hints match.

What's the deal with TLS impersonation libraries?

They make HTTP libraries (not browsers) produce a TLS fingerprint that matches a real Chrome. Defeats TLS-fingerprint-based detection. Doesn't help with JS challenges.

Can I use my home IP?

For occasional small scrapes, yes. For volume, no — your ISP doesn't like it and you'll get rate-limited by upstream first.

Fingerprinting at a glance

Cloudflare Bot Management

DataDome

Akamai Bot Manager

Kasada

The TLS fingerprint problem

IP reputation — the lever you can pull

Detection signals you control

The bypass-by-API trick

What we recommend by protection level

Need to scrape a protected target?

FAQ

Is bypassing anti-bot protection legal?

Which is the hardest to bypass?

Why does Cloudflare let me through some days and block me others?

Should I rotate user agents?

What's the deal with TLS impersonation libraries?

Can I use my home IP?

What about reCAPTCHA v3?

How do anti-bot platforms react to LLM-driven browsers?

Keep reading

Fingerprinting at a glance

Cloudflare Bot Management

DataDome

Akamai Bot Manager

Kasada

The TLS fingerprint problem

IP reputation — the lever you can pull

Detection signals you control

The bypass-by-API trick

What we recommend by protection level

Need to scrape a protected target?

FAQ

Is bypassing anti-bot protection legal?

Which is the hardest to bypass?

Why does Cloudflare let me through some days and block me others?

Should I rotate user agents?

What's the deal with TLS impersonation libraries?

Can I use my home IP?

What about reCAPTCHA v3?

How do anti-bot platforms react to LLM-driven browsers?

Keep reading