Your App LogoYOUR APP EXPERTYAE
    • Services
    • About
    • Portfolio
    • Blog
    • FAQ
    • Build Your App
    1. Home
    2. Blog
    3. Headless browser scraping: Playwright vs Puppeteer in 2026
    Scraping & Automation

    Headless browser scraping: Playwright vs Puppeteer in 2026

    An opinionated comparison — Playwright vs Puppeteer vs newer alternatives. When each one wins, the bot-detection gap, and what production scraping infra actually looks like.

    YAEL Engineering·14 Feb 2026·8 min read·1,563 words
    On this page
    • The honest Playwright vs Puppeteer comparison
    • What Playwright auto-wait actually buys you
    • When stock headless gets caught
    • The stealth stack
    • Network interception — Playwright's quiet win
    • Browser per request vs persistent contexts
    • The cost model
    • What about playwright-recorder / codegen?
    • Newer alternatives worth knowing
    • What we ship by default
    • FAQ
    • Is scraping legal?
    • Can I scrape JavaScript-rendered sites with fetch?
    • What's the cheapest scraper for low volume?
    • Do I need to use a captcha solver?
    • What about Selenium?
    • Can I run Playwright on Vercel / Cloudflare Workers?
    • How do I keep selectors stable when the site redesigns?
    • What about LLM-based scraping?

    In 2026 the right default for headless browser scraping is Playwright. Puppeteer is still excellent and still actively maintained but Playwright's auto-waiting model, multi-browser support, and stronger network interception API make it the safer choice for new projects. The interesting question is no longer Playwright vs Puppeteer — it's whether stock headless is enough, or whether you need anti-detect tooling on top. For ~70% of scraping jobs Playwright stock is fine. For the other 30% you need stealth plugins, residential proxies, and increasingly real browser farms.

    We use Playwright across every scraping engagement at YAEL. This is what we've learned from running it in production against sites that don't want to be scraped.

    The honest Playwright vs Puppeteer comparison

    | | Playwright | Puppeteer | |---|---|---| | Maintained by | Microsoft | Chrome team | | Browsers | Chromium, Firefox, WebKit | Chromium only (officially) | | Auto-wait | Built-in, comprehensive | Manual | | Network interception | Mature, full request/response control | Good but less ergonomic | | Multiple contexts | Native, first-class | Workable | | Selector engine | CSS, XPath, text, role, data-testid | CSS, XPath | | Speed | Roughly tied | Roughly tied | | Anti-bot detection | Detected by sophisticated sites | Same | | Ecosystem | Strong, growing | Mature, stable |

    Playwright wins on auto-wait and multi-browser. Puppeteer wins on size of the existing community. Both are detected by the same set of anti-bot platforms.

    What Playwright auto-wait actually buys you

    In Puppeteer, you write:

    ts
    await page.click("button.submit");
    await page.waitForSelector(".success-toast");
    const text = await page.$eval(".result", (el) => el.textContent);

    In Playwright:

    ts
    await page.click("button.submit");
    const text = await page.locator(".result").textContent();

    Playwright's locator auto-waits for the element to exist and be actionable. The Puppeteer code crashes randomly when the page is mid-render. Playwright handles it transparently.

    This single difference cuts our flaky-scraper rate by roughly half. It's the biggest reason we recommend Playwright for new builds.

    When stock headless gets caught

    Modern anti-bot platforms (Cloudflare Bot Management, DataDome, PerimeterX, Akamai Bot Manager, Kasada) detect headless browsers through:

    • navigator.webdriver === true
    • Missing or unusual plugins
    • Canvas fingerprinting inconsistencies
    • WebGL fingerprint mismatches
    • Suspicious timing (no mouse movement, instantaneous clicks)
    • Suspicious user-agent + IP combinations
    • TLS fingerprint (JA3, JA4) mismatch with claimed browser

    A stock chromium.launch({ headless: true }) fails all of these. We cover the full taxonomy in anti-bot defences: Cloudflare, DataDome, Akamai explained.

    The stealth stack

    For sites that bot-detect, you escalate:

    1. playwright-extra + puppeteer-extra-plugin-stealth — patches the most obvious detections (navigator.webdriver, plugin list, etc). Free, fast, defeats the bottom 60% of detection.
    2. Residential or mobile proxies — IP reputation is a huge signal. Bright Data, Oxylabs, SmartProxy. ~$5-15 per GB. Defeats IP-based blocking.
    3. Real browser farm — services like Browserless, Browserbase, or self-hosted real Chrome with anti-detect profiles. Defeats canvas/WebGL fingerprinting that headless can't fake.
    4. Captcha solvers — last resort. 2Captcha, CapMonster, Anti-Captcha. Cents per solve.

    We escalate one rung at a time and stop at whatever works. Most jobs end at level 2.

    ts
    // Playwright + stealth + residential proxy
    import { chromium } from "playwright-extra";
    import stealth from "puppeteer-extra-plugin-stealth";
    
    chromium.use(stealth());
    
    const browser = await chromium.launch({
      proxy: {
        server: "http://proxy.brightdata.com:22225",
        username: process.env.BRIGHT_USER!,
        password: process.env.BRIGHT_PASS!,
      },
      headless: true,
    });
    const ctx = await browser.newContext({
      userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36",
      viewport: { width: 1440, height: 900 },
      locale: "en-GB",
      timezoneId: "Europe/London",
    });

    The user-agent, viewport, locale, and timezone need to match a plausible real browser. Mismatches are signals.

    Network interception — Playwright's quiet win

    Playwright's request routing is one of its strongest features. Block images and CSS, return mock responses for ads, capture XHR payloads:

    ts
    // Block heavy resources to speed up scraping
    await page.route("**/*", (route) => {
      const t = route.request().resourceType();
      if (t === "image" || t === "media" || t === "font") return route.abort();
      return route.continue();
    });
    
    // Capture API responses while navigating
    const responses: unknown[] = [];
    page.on("response", async (res) => {
      if (res.url().includes("/api/products")) {
        responses.push(await res.json());
      }
    });
    
    await page.goto("https://example.com/products");

    For sites that load data via XHR, intercepting the API response is often faster and more reliable than scraping the rendered HTML.

    Browser per request vs persistent contexts

    Two patterns.

    Pattern A — fresh browser per job. Slowest but cleanest. No state leaks between jobs. Highest detection resistance because every job looks like a fresh user.

    Pattern B — persistent context, reused across jobs. Faster. Useful for site-specific scrapers where you can amortize the cookie/session warm-up. Risk: state leak if you forget to clear cookies between distinct targets.

    For most production scraping we run pattern A with a hot-pool of browsers (5-10 warm Playwright instances waiting for jobs). The amortization pays off without the state-leak risk.

    ts
    // Hot-pool pattern
    class BrowserPool {
      private pool: Browser[] = [];
      private size: number;
      async init(size: number) {
        this.size = size;
        this.pool = await Promise.all(
          Array.from({ length: size }, () => chromium.launch()),
        );
      }
      async acquire() {
        const browser = this.pool.pop() ?? (await chromium.launch());
        return {
          browser,
          release: async () => {
            if (this.pool.length < this.size) this.pool.push(browser);
            else await browser.close();
          },
        };
      }
    }

    The cost model

    A short cost-per-scrape comparison:

    | Setup | Cost / 1k scrapes | |---|---| | Stock Playwright, datacenter proxies | ~$0.20 | | Playwright + stealth + residential proxies | ~$2-5 | | Browser farm (Browserless / Browserbase) | ~$5-15 | | Real browser + manual captcha solving | ~$20+ |

    Plan your job for the lowest tier that works. Don't pay browser farm prices for sites that fall to stealth + residential.

    What about playwright-recorder / codegen?

    Useful for prototyping a scraper interactively. playwright codegen example.com opens a browser, records your clicks as Playwright code. We use it for the initial pass on a new target site, then refactor the generated code.

    bash
    pnpm dlx playwright codegen --target javascript https://example.com

    Do not ship the codegen output as production code. It uses overly specific selectors (text=Submit in the wrong place) that break on minor UI changes. Refactor into named selectors and add explicit waits.

    Newer alternatives worth knowing

    A short list:

    • Browserless — managed Chrome with anti-detect, captcha solving built in. Pay per second of browser time.
    • Browserbase — newer entrant, simpler API, well-funded.
    • Camoufox — a fork of Firefox with anti-detect baked in. Open source. Useful when you need to look exactly like Firefox.
    • Apify — fully managed scraping platform with built-in proxy rotation. Good for non-engineering teams.

    For most engineering-led teams, Playwright self-hosted with proxies is the best cost-quality trade. The managed services are right when ops cost matters more than per-scrape cost.

    What we ship by default

    For a typical scraping engagement at YAEL:

    1. Playwright + playwright-extra + stealth plugin
    2. BullMQ queue with rate limits per target domain
    3. Residential proxies for any site that has bot detection
    4. Per-target adapter modules (one folder per site, isolated selectors)
    5. Snapshot tests on the parsing layer (save HTML, parse, assert)
    6. Daily smoke runs that catch site changes before customers notice

    We can describe a typical scrape build in three pages. Most of the production complexity is in the operational layer — queue, retries, observability — not in the scraping code itself.

    Need a production scraper that doesn't break weekly?

    We've built scraping infrastructure into competitive intelligence platforms, price tracking products, and AI agent retrieval pipelines.

    See scraping service

    FAQ

    Is scraping legal?

    Depends on what you scrape, where you are, and what the site's terms say. Generally: public data without bypassing technical controls = grey area. Behind a login or paywall = much riskier. Always read the site's robots.txt and ToS. We are not your lawyer.

    Can I scrape JavaScript-rendered sites with fetch?

    If you can reverse-engineer the API the page calls, yes — and it's much faster than headless. Always check Network tab first. Headless is the fallback when the API isn't usable.

    What's the cheapest scraper for low volume?

    fetch + cheerio for static HTML. Playwright with stock residential proxies for JS-rendered sites. Everything else is overkill until you hit detection.

    Do I need to use a captcha solver?

    Only if your target site presents captchas. Most don't until they detect you. Get caught less and you won't need a solver. If you do, 2Captcha at ~$1 per 1k reCAPTCHA v2 solves is the cheapest production option.

    What about Selenium?

    Don't pick Selenium for a new project. Slower, older, more detectable. Playwright covers everything Selenium does and more.

    Can I run Playwright on Vercel / Cloudflare Workers?

    Vercel: yes, but slow. Workers: not directly — use Browserless or similar. For sustained scraping workloads, run Playwright on a long-lived VM or container.

    How do I keep selectors stable when the site redesigns?

    Wherever possible, prefer semantic selectors — getByRole, getByText, getByLabel — over CSS class names. Class names change every redeploy. Semantics rarely change.

    What about LLM-based scraping?

    Useful for one-off extractions on unstructured pages — give Claude the HTML and ask for structured data. Expensive at scale. We use it for the long-tail "we need data from 500 different sites, each with different HTML" case where building 500 adapters isn't economic.

    TagsPlaywrightPuppeteerScrapingHeadlessAutomation
    ServiceWeb Scraping ServicesAutomation Scripts
    PreviousDiscord bot + Stripe paid roles: the full architectureNext Anti-bot defences: Cloudflare, DataDome, Akamai explained

    Keep reading

    Scraping & AutomationAnti-bot defences: Cloudflare, DataDome, Akamai explainedWhat each of the major anti-bot platforms actually does, the signals they read, and which one is hardest to defeat in 2026.9 min readSaaSHow to build a SaaS MVP in 6 weeks (without a rewrite later)A six-week SaaS MVP plan that doesn't trade speed for technical debt — auth, billing, multi-tenancy, and a real operator dashboard from day one.10 min readPaymentsStripe Billing vs Paddle vs LemonSqueezy for SaaS in 2026An opinionated comparison of the three default billing platforms for B2B SaaS — pricing model coverage, MoR vs not, dev DX, and where each one breaks at scale.8 min read
    On this page
    • The honest Playwright vs Puppeteer comparison
    • What Playwright auto-wait actually buys you
    • When stock headless gets caught
    • The stealth stack
    • Network interception — Playwright's quiet win
    • Browser per request vs persistent contexts
    • The cost model
    • What about playwright-recorder / codegen?
    • Newer alternatives worth knowing
    • What we ship by default
    • FAQ
    • Is scraping legal?
    • Can I scrape JavaScript-rendered sites with fetch?
    • What's the cheapest scraper for low volume?
    • Do I need to use a captcha solver?
    • What about Selenium?
    • Can I run Playwright on Vercel / Cloudflare Workers?
    • How do I keep selectors stable when the site redesigns?
    • What about LLM-based scraping?

    YOUR APP EXPERT LTD

    71-75 Shelton Street, LONDON WC2H 9JQ, UK

    +44 20 1234 5678

    [email protected]

    Quick Links

    • Services
    • About Us
    • Portfolio
    • Blog
    • Contact

    Stay Connected

    Newsletter

    Stay updated with our latest innovations and insights.

    © 2026 YOUR APP EXPERT LTD. All rights reserved.

    Engineering the Future of Technology