Your App LogoYOUR APP EXPERTYAE
    • Services
    • About
    • Portfolio
    • Blog
    • FAQ
    • Build Your App
    1. Home
    2. Blog
    3. Anti-bot defences: Cloudflare, DataDome, Akamai explained
    Scraping & Automation

    Anti-bot defences: Cloudflare, DataDome, Akamai explained

    What each of the major anti-bot platforms actually does, the signals they read, and which one is hardest to defeat in 2026.

    YAEL Engineering·05 Feb 2026·9 min read·1,732 words
    On this page
    • Fingerprinting at a glance
    • Cloudflare Bot Management
    • DataDome
    • Akamai Bot Manager
    • Kasada
    • The TLS fingerprint problem
    • IP reputation — the lever you can pull
    • Detection signals you control
    • The bypass-by-API trick
    • What we recommend by protection level
    • FAQ
    • Is bypassing anti-bot protection legal?
    • Which is the hardest to bypass?
    • Why does Cloudflare let me through some days and block me others?
    • Should I rotate user agents?
    • What's the deal with TLS impersonation libraries?
    • Can I use my home IP?
    • What about reCAPTCHA v3?
    • How do anti-bot platforms react to LLM-driven browsers?

    The four big anti-bot platforms in 2026 are Cloudflare Bot Management, DataDome, Akamai Bot Manager, and Kasada. They all do roughly the same thing — fingerprint your browser, check it against known-bot profiles, and either let you through, challenge you, or block you outright. Where they differ is in what signals they weight most heavily and how aggressive their defaults are. Cloudflare is the most permissive and the most common. Kasada is the most aggressive and the hardest to defeat. DataDome and Akamai sit between, with DataDome having the strongest behavioral analysis. If your scrape target uses any of these, success depends much more on understanding which than on which scraping library you picked.

    We spend significant time on the anti-bot side because most scraping engagements at YAEL involve at least one protected target. This is the field guide.

    Fingerprinting at a glance

    Every anti-bot platform combines several signals. The signals matter in roughly this order:

    1. TLS fingerprint (JA3, JA4, JA4_R). What does your TLS handshake look like compared to a real Chrome's?
    2. HTTP/2 fingerprint (Akamai's H2 fingerprint). Same idea, at the HTTP/2 layer.
    3. Browser fingerprint (canvas, WebGL, audio, screen, fonts). What does the rendered browser look like?
    4. JavaScript challenge execution. Can your "browser" actually run the obfuscated JS the protection serves?
    5. Behavior (mouse movement, scroll patterns, keypress timing). Are you a human or a machine?
    6. IP reputation. Is the IP a known datacenter, residential, mobile, or proxy?
    7. Account / cookie signals. Have you been here before? Did you complete a challenge recently?

    Most defenses are cumulative. Failing one signal gives you a soft challenge. Failing three gets you blocked.

    Cloudflare Bot Management

    The most common because Cloudflare is the most common CDN. Cloudflare's signal stack:

    • TLS fingerprint via their own classifier (BIC — Bot in Code)
    • A turnstile-style JS challenge that runs invisibly
    • IP reputation via Cloudflare's threat intelligence
    • Behavior tracking via the Cloudflare Web Analytics signal

    Cloudflare's defaults are surprisingly permissive. A stock Playwright with a real-looking user-agent, residential proxy, and the stealth plugin gets through Cloudflare's medium-strength setting on most sites. Their "I'm Under Attack" mode is stricter and triggers Turnstile, which requires either solving it interactively or using a service like 2Captcha for Turnstile.

    ts
    // What works for Cloudflare medium setting
    const browser = await chromium.launch({
      proxy: residentialProxy,
      headless: true,
    });
    const ctx = await browser.newContext({
      userAgent: realChromeUA,
      viewport: { width: 1440, height: 900 },
      locale: "en-US",
      timezoneId: "America/New_York",
    });
    const page = await ctx.newPage();
    // Stealth plugin handles navigator.webdriver etc
    await page.goto(target, { waitUntil: "networkidle" });

    The detail that catches teams: Cloudflare's challenge uses non-deterministic delays. A scraper that fires the next request 50ms after the previous one looks robotic. Add 1-3 second jitters between requests on the same domain.

    DataDome

    DataDome's reputation is "very behavioral." They are. Their signal stack:

    • TLS + HTTP/2 fingerprint, weighted heavily
    • Canvas / WebGL fingerprint, with strict consistency checks
    • Behavioral mouse-movement tracking
    • A more complex JS challenge than Cloudflare's
    • Captcha solving challenge (their own) on suspicion

    DataDome is harder than Cloudflare. Stock Playwright fails. Stealth plugin + residential proxy gets through the simple cases. Their "high" setting requires real browser fingerprints, which is why anti-detect browser farms exist.

    A common DataDome failure: the sec-ch-ua client hints don't match the user-agent. DataDome compares them. If your user-agent says Chrome 130 but sec-ch-ua says Chrome 124, you're cooked.

    ts
    // Override client hints to match the user-agent
    await ctx.setExtraHTTPHeaders({
      "sec-ch-ua": '"Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"',
      "sec-ch-ua-mobile": "?0",
      "sec-ch-ua-platform": '"macOS"',
    });

    Akamai Bot Manager

    Akamai is older and used by big enterprises (banks, airlines, Fortune 500). Their signal stack:

    • Their proprietary _abck cookie challenge (a JS-generated token that proves you ran their script)
    • HTTP/2 fingerprint comparison
    • TLS fingerprint
    • IP reputation via their honeypot network

    Akamai's _abck cookie is the famous gotcha. Get the cookie wrong (or don't have it) and every subsequent request returns a soft block — 200 OK with content that's secretly an error. Catch it by checking the cookie's structure on each request and re-solving the challenge if it expires.

    There are open-source reimplementations of Akamai's challenge solver. None are particularly reliable. For Akamai targets, paid services (NetNut, Bright Data Web Unlocker) tend to be more economic than building your own.

    Kasada

    The newest of the four and the most aggressive. Kasada's pitch is that they detect bots that defeat the others. They do. Their stack:

    • Polymorphic JS challenge that's different every page load
    • Extremely strict TLS / HTTP/2 fingerprinting
    • Behavioral analysis that flags absent mouse movement
    • Active honeypots in the page DOM

    Kasada is the only one of the four where stealth-plus-residential reliably fails. To scrape a Kasada-protected target, you typically need either a real-browser-farm service (with anti-detect profiles), a paid Kasada-specific solver, or the bypass-by-API trick (find the underlying JSON API the site calls and skip the protected HTML page entirely).

    The TLS fingerprint problem

    TLS fingerprinting is the most underappreciated signal. Even with a perfect-looking browser, your TLS handshake leaks information.

    curl and Node's http module use OpenSSL's TLS, which produces a fingerprint very different from Chrome's. A real headless Chromium produces the right fingerprint because it uses BoringSSL the same way real Chrome does. This is why Playwright and Puppeteer work where fetch doesn't.

    If you're scraping via HTTP libraries (not headless), you need a TLS-matching client like curl_cffi (Python) or node-tls-impersonate. These reproduce Chrome's exact TLS handshake.

    python
    # Python with curl_cffi
    from curl_cffi import requests
    r = requests.get("https://example.com", impersonate="chrome120")

    IP reputation — the lever you can pull

    The single biggest controllable factor. Anti-bot platforms maintain block lists of:

    • AWS, GCP, Azure, DigitalOcean, OVH IP ranges (90% of scrape traffic comes from these)
    • Known commercial proxy IP ranges
    • Tor exit nodes
    • IPs flagged by ML models from prior bot traffic

    Your options, ranked by cost:

    1. Your own datacenter IPs — free, but increasingly useless. Cloudflare blocks them outright on most sites.
    2. Datacenter proxies — cheap (~$0.50/GB), often blocked.
    3. Residential proxies — moderate ($5-15/GB), high success rate.
    4. Mobile proxies — expensive ($15-40/GB), almost never blocked.
    5. ISP proxies (datacenter IPs that look like residential to anti-bot platforms) — emerging middle option.

    The right tier depends on the target. For low-protection sites, datacenter is fine. For Cloudflare + DataDome targets, residential is the floor.

    Free proxy lists are honeypots

    Don't use them. The IPs are universally pre-blocked, and many "free proxy" services exfiltrate your traffic. Real proxies cost money.

    Detection signals you control

    A short list of things to fix on your scraper before paying for fancier proxies:

    • navigator.webdriver === true → fix with stealth plugin
    • Missing chrome object → fix with stealth plugin
    • Wrong window size (1024x768 default is suspicious — use 1440x900)
    • Missing or wrong client hints
    • Wrong language / timezone combinations
    • Instantaneous "human" actions (set realistic timings)
    • Missing Accept, Accept-Encoding, Accept-Language headers
    • Identical request fingerprint on every page (vary timings, vary scroll depth)

    If you fail any of these, no amount of residential proxy money saves you.

    The bypass-by-API trick

    The cheapest scrape is the one that doesn't render HTML. Many sites that look anti-bot-protected at the HTML layer have a JSON API that's barely protected. Open Network tab, find the XHR, hit it directly.

    ts
    // Often the API endpoint is open with a session cookie
    const res = await fetch("https://example.com/api/products?page=1", {
      headers: {
        "x-csrf-token": csrfTokenFromCookie,
        "user-agent": realChromeUA,
        accept: "application/json",
      },
    });

    If you can do this, do this. It's 100x faster than headless and often easier to keep working.

    What we recommend by protection level

    | Protection | Approach | |---|---| | None / robots.txt only | fetch + cheerio | | Basic JS rendering needed | Playwright stock, datacenter proxies | | Cloudflare bot detection on | Playwright + stealth + residential | | DataDome on | Playwright + stealth + residential + matching client hints | | Akamai on | Paid bypass service (Bright Data Web Unlocker) | | Kasada on | Anti-detect browser farm (Browserless, Browserbase) or paid solver | | Captcha gates | 2Captcha or human-in-the-loop |

    Need to scrape a protected target?

    We've shipped scraping infrastructure against Cloudflare, DataDome, Akamai, and Kasada — and we know which battles are worth picking.

    See scraping service

    FAQ

    Is bypassing anti-bot protection legal?

    Depends on jurisdiction and what's behind the wall. Public data with a CDN-level rate limit is usually fine. Data behind a login or paywall — much riskier. Read CFAA jurisprudence (US) and the relevant equivalents in your jurisdiction. We are not your lawyer.

    Which is the hardest to bypass?

    Kasada in 2026, narrowly. Akamai for sites that have full Bot Manager Premier turned on. Both require paid services for most teams.

    Why does Cloudflare let me through some days and block me others?

    Cloudflare's bot score is non-deterministic — it shifts as their ML model retrains and as your IP's reputation drifts. A scraper that worked yesterday breaking today is a Tuesday at Cloudflare.

    Should I rotate user agents?

    Slowly. Rotating UA per request looks bot-like. Rotating once per session (one UA per "user") is more natural. Always make sure UA and client hints match.

    What's the deal with TLS impersonation libraries?

    They make HTTP libraries (not browsers) produce a TLS fingerprint that matches a real Chrome. Defeats TLS-fingerprint-based detection. Doesn't help with JS challenges.

    Can I use my home IP?

    For occasional small scrapes, yes. For volume, no — your ISP doesn't like it and you'll get rate-limited by upstream first.

    What about reCAPTCHA v3?

    It's an invisible behavioral signal that returns a score between 0 (bot) and 1 (human). Sites use the score to decide whether to challenge. Defeating v3 specifically means looking like a normal user across multiple sessions — proxies help, but behavior helps more.

    How do anti-bot platforms react to LLM-driven browsers?

    They detect them by the same signals: TLS fingerprint, browser fingerprint, behavior. A "Claude-powered" browser still has to render with Chromium or similar, so the same defenses apply. The interesting frontier is whether platforms develop LLM-specific signals; not seeing it widely yet.

    TagsCloudflareDataDomeAkamaiAnti-botScraping
    ServiceWeb Scraping Services
    PreviousHeadless browser scraping: Playwright vs Puppeteer in 2026Next React Native vs native: the honest decision framework

    Keep reading

    Scraping & AutomationHeadless browser scraping: Playwright vs Puppeteer in 2026An opinionated comparison — Playwright vs Puppeteer vs newer alternatives. When each one wins, the bot-detection gap, and what production scraping infra actually looks like.8 min readSaaSHow to build a SaaS MVP in 6 weeks (without a rewrite later)A six-week SaaS MVP plan that doesn't trade speed for technical debt — auth, billing, multi-tenancy, and a real operator dashboard from day one.10 min readPaymentsStripe Billing vs Paddle vs LemonSqueezy for SaaS in 2026An opinionated comparison of the three default billing platforms for B2B SaaS — pricing model coverage, MoR vs not, dev DX, and where each one breaks at scale.8 min read
    On this page
    • Fingerprinting at a glance
    • Cloudflare Bot Management
    • DataDome
    • Akamai Bot Manager
    • Kasada
    • The TLS fingerprint problem
    • IP reputation — the lever you can pull
    • Detection signals you control
    • The bypass-by-API trick
    • What we recommend by protection level
    • FAQ
    • Is bypassing anti-bot protection legal?
    • Which is the hardest to bypass?
    • Why does Cloudflare let me through some days and block me others?
    • Should I rotate user agents?
    • What's the deal with TLS impersonation libraries?
    • Can I use my home IP?
    • What about reCAPTCHA v3?
    • How do anti-bot platforms react to LLM-driven browsers?

    YOUR APP EXPERT LTD

    71-75 Shelton Street, LONDON WC2H 9JQ, UK

    +44 20 1234 5678

    [email protected]

    Quick Links

    • Services
    • About Us
    • Portfolio
    • Blog
    • Contact

    Stay Connected

    Newsletter

    Stay updated with our latest innovations and insights.

    © 2026 YOUR APP EXPERT LTD. All rights reserved.

    Engineering the Future of Technology