Webhook idempotency: the bug most teams ship

Q: What's the right table column for the event id?

Use `text` not `varchar`. Stripe event ids are 30+ characters. Add `primary key` for the dedup constraint. Add `processed_at` for the cleanup job.

Q: What if my handler depends on a slow downstream API?

Don't do the slow work in the handler. Insert into `processed_events`, enqueue a job, return 200. The job does the slow work, with its own retries. Keeps your handler response time under 2 seconds.

Why webhook handlers double-charge, double-grant, and double-cancel — and the three-line database pattern that fixes all of it.

YAEL Engineering18 Mar 20268 min read1,652 words

The webhook handler that processes a Stripe event twice is the most common preventable bug we see in SaaS code reviews. Stripe will retry. So will Paddle, Telegram, Slack, GitHub, Shopify, and every reasonable webhook source. If your handler is not idempotent, retries become double-charges, double-grants, duplicate emails, and customer support tickets you cannot explain. The fix is one database table and one unique constraint. It takes ten minutes to add and it survives every kind of failure your network can produce.

This is the pattern we ship into every webhook handler we write — for CloudChat's Stripe billing, for partner-integration webhooks, for the Telegram bot Stripe integration we wrote about separately.

What actually goes wrong

Stripe (and every other webhook source) sends events with at-least-once delivery. Reasons retries happen:

Your endpoint returned non-2xx
Your endpoint took >10 seconds to respond
The network blipped between Stripe and you
Stripe's delivery infrastructure had a hiccup
You restarted your server mid-handler

In every case Stripe assumes the event was not received and resends. If your handler ran to completion the first time but the response didn't make it back, Stripe will retry — and your handler will run again, on a payload it has already processed.

A non-idempotent customer.subscription.created handler will provision the plan twice. A non-idempotent invoice.payment_succeeded will credit usage twice. A non-idempotent customer.subscription.deleted will downgrade an already-downgraded account (less bad, but still messy in audit logs).

The fix, in full

One table:

sql

create table processed_events (
  id         text primary key,
  source     text not null,
  type       text not null,
  payload    jsonb not null,
  processed_at timestamptz not null default now()
);

One pattern:

// src/app/api/stripe/webhook/route.ts
import { headers } from "next/headers";
import Stripe from "stripe";
import { db } from "@/lib/db";

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);

export async function POST(req: Request) {
  const sig = (await headers()).get("stripe-signature");
  const body = await req.text();

  let event: Stripe.Event;
  try {
    event = stripe.webhooks.constructEvent(
      body,
      sig!,
      process.env.STRIPE_WEBHOOK_SECRET!,
    );
  } catch {
    return new Response("invalid signature", { status: 400 });
  }

  const result = await db.transaction(async (tx) => {
    // INSERT ... ON CONFLICT DO NOTHING.
    // If the row already exists, this is a retry — skip the work.
    const inserted = await tx.execute(
      `insert into processed_events (id, source, type, payload)
       values ($1, 'stripe', $2, $3)
       on conflict (id) do nothing
       returning id`,
      [event.id, event.type, event],
    );
    if (inserted.length === 0) return "duplicate";

    await applyEffect(tx, event);
    return "processed";
  });

  return new Response(JSON.stringify({ result }), { status: 200 });
}

That's it. The unique constraint on id plus on conflict do nothing plus the transaction guarantees that applyEffect runs exactly once per unique event id — even under concurrent retries, even across process restarts, even if the database connection dies mid-handler.

Why the transaction matters

Two things must happen atomically: (1) record that you processed the event, (2) apply the effect. If they're separate, you have a window where one happens and the other doesn't.

The transaction collapses that window. If applyEffect throws, the insert is rolled back, the event is not marked processed, and the next retry re-runs the whole thing. If applyEffect succeeds, both the insert and the effect commit together.

Concurrent retries

Stripe will sometimes send the same event in parallel — especially after their delivery infrastructure recovers from a hiccup. Two requests, same event id, both insert at the same time.

The on conflict do nothing handles this correctly. One insert wins (returns the id), the other loses (returns empty). The losing handler skips applyEffect. No double-processing.

For belt-and-suspenders, you can also select ... for update the row first:

const existing = await tx.execute(
  `select id from processed_events where id = $1 for update`,
  [event.id],
);
if (existing.length > 0) return "duplicate";
// ... insert and apply

But the on conflict approach is simpler and equally safe.

What "applying the effect" looks like

Each event type has a handler. The handlers can be naive about idempotency because the outer wrapper guarantees they run at-most-once.

async function applyEffect(tx: DbTx, event: Stripe.Event) {
  switch (event.type) {
    case "customer.subscription.created":
    case "customer.subscription.updated":
      return upsertSubscription(tx, event.data.object as Stripe.Subscription);
    case "customer.subscription.deleted":
      return cancelSubscription(tx, event.data.object as Stripe.Subscription);
    case "invoice.payment_succeeded":
      return creditUsage(tx, event.data.object as Stripe.Invoice);
    case "invoice.payment_failed":
      return flagFailure(tx, event.data.object as Stripe.Invoice);
    default:
      return; // unknown event type, ack and move on
  }
}

Note: even upsertSubscription doing an upsert is fine — the outer wrapper means we won't call it twice for the same event anyway.

The 200 response trick

Stripe (and most webhook sources) treat any non-2xx response as "redeliver this." That includes 500s. So an unhandled exception in your handler triggers a retry — which is what you want, as long as your handler is idempotent.

The trap: returning 500 after the database commit. Stripe redelivers, your idempotency check fires, you return 200, and now Stripe stops retrying. Good. But what about a 500 before the commit? The transaction rolls back, the event isn't marked processed, the retry will run the handler again. Also good.

What you don't want: returning 200 too eagerly. We've seen code like this in the wild:

// BAD — never do this
export async function POST(req: Request) {
  const event = parse(await req.text());
  setImmediate(() => applyEffect(event)); // fire and forget
  return new Response("ok"); // 200, but the work hasn't happened
}

This is broken. If applyEffect fails, Stripe doesn't know to retry because you already returned 200. Real production bug we've debugged at customer code reviews. Don't do this.

What about Stripe's own idempotency keys?

Stripe gives you an Idempotency-Key header you can pass when calling Stripe APIs. It guarantees Stripe won't double-process your request. That's a different problem. Webhook idempotency is about your handler not double-processing Stripe's events.

You need both: idempotency keys on outgoing API calls (so creating a customer doesn't accidentally create two), and processed-event deduplication on incoming webhooks. They're symmetric and they solve different bugs.

Testing idempotency

Two tests in CI:

test("processing the same event twice has the same effect as once", async () => {
  const event = makeStripeEvent({ id: "evt_test_1", type: "customer.subscription.created" });
  await handleEvent(event);
  await handleEvent(event); // second call should be a no-op
  const subs = await db.subscriptions.findMany();
  expect(subs.length).toBe(1);
});

test("concurrent processing of the same event still produces one effect", async () => {
  const event = makeStripeEvent({ id: "evt_test_2", type: "customer.subscription.created" });
  await Promise.all([handleEvent(event), handleEvent(event), handleEvent(event)]);
  const subs = await db.subscriptions.findMany();
  expect(subs.length).toBe(1);
});

The second test catches the "I forgot to use a transaction" bug.

Cleaning up processed_events

The table grows forever if you don't prune. Stripe's redelivery window is ~7 days. We keep 30 days of events for debugging and then trim:

sql

delete from processed_events
where processed_at < now() - interval '30 days';

Run nightly. Add a processed_at btree index if you have millions of events.

Other webhook sources, same pattern

This isn't Stripe-specific. The exact same pattern works for:

Telegram bot updates (use update_id)
WhatsApp Business API (use the message id)
GitHub webhooks (use the X-GitHub-Delivery header)
Slack events API (use event_id)
Shopify webhooks (use X-Shopify-Webhook-Id)
Discord interactions (use the interaction id)

Every one of them sends a unique id per event. Every one of them will retry. Every one of them needs this same five-line pattern.

We ship webhook integrations that don't double-fire.

Stripe, Paddle, Telegram, Discord, GitHub — same pattern, hardened in production across every customer integration.

See API integrations service

FAQ

What if my handler is fast — do I still need this?

Yes. "Fast" doesn't prevent retries. Network failures, client-side timeouts, infrastructure restarts — they all cause retries. The idempotency layer is the only correct defense.

Can I use Redis instead of Postgres?

You can, but you'd need to be very careful about the atomicity guarantee. SETNX in Redis works for the lock, but you've now got two systems to coordinate (the lock in Redis, the effect in Postgres). One transaction in Postgres is simpler and safer.

What about Stripe's webhook signing — isn't that enough?

Signing verifies the event came from Stripe. It doesn't deduplicate retries. Those are different layers — you need both.

Does Stripe retry forever?

No. Stripe retries for ~3 days on a backoff schedule. If your endpoint is down for 4 days, you'll miss events. Set up a monitor that alerts when webhook delivery fails and a backfill job that fetches recent events via the API on recovery.

What's the right table column for the event id?

Use text not varchar. Stripe event ids are 30+ characters. Add primary key for the dedup constraint. Add processed_at for the cleanup job.

How do I handle out-of-order events?

Stripe (and most sources) don't guarantee order. Your handler must tolerate subscription.deleted arriving before subscription.updated. Pattern: always read the current state from Stripe's API at handler time rather than trusting the payload. The payload tells you what changed; the API tells you the truth.

What if my handler depends on a slow downstream API?

Don't do the slow work in the handler. Insert into processed_events, enqueue a job, return 200. The job does the slow work, with its own retries. Keeps your handler response time under 2 seconds.

Should I version the event payload?

If you serialize it into your warehouse for analytics, yes — schema changes over time. If you only use it for dedup, no — the unique id is all you need from a re-processing standpoint.

TagsWebhooks Stripe Idempotency Postgres Reliability

ServiceStripe Integration API Integration Services SaaS Development

Case studyCloudChat

Keep reading

ArchitectureMulti-tenant Postgres: row-level security explained (with real code)How RLS actually works in production multi-tenant SaaS — set policies, set the session variable, handle bypass, and avoid the three failure modes that bite teams at scale.9 min read ArchitectureNext.js App Router: server actions vs API routes — when to pick eachWhen server actions are the right call, when API routes still win, and the production patterns we use on every Next.js build.8 min read Bots & MessagingTelegram bot payments with Stripe: the production integration guideHow to wire Stripe into a Telegram bot the right way — invoice flows, webhook idempotency, refund handling, and the parts the docs don't tell you.8 min read

Architecture

Webhook idempotency: the bug most teams ship

Why webhook handlers double-charge, double-grant, and double-cancel — and the three-line database pattern that fixes all of it.

YAEL Engineering18 Mar 20268 min read1,652 words

What actually goes wrong

Stripe (and every other webhook source) sends events with at-least-once delivery. Reasons retries happen:

Your endpoint returned non-2xx
Your endpoint took >10 seconds to respond
The network blipped between Stripe and you
Stripe's delivery infrastructure had a hiccup
You restarted your server mid-handler

The fix, in full

One table:

sql

create table processed_events (
  id         text primary key,
  source     text not null,
  type       text not null,
  payload    jsonb not null,
  processed_at timestamptz not null default now()
);

One pattern:

// src/app/api/stripe/webhook/route.ts
import { headers } from "next/headers";
import Stripe from "stripe";
import { db } from "@/lib/db";

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);

export async function POST(req: Request) {
  const sig = (await headers()).get("stripe-signature");
  const body = await req.text();

  let event: Stripe.Event;
  try {
    event = stripe.webhooks.constructEvent(
      body,
      sig!,
      process.env.STRIPE_WEBHOOK_SECRET!,
    );
  } catch {
    return new Response("invalid signature", { status: 400 });
  }

  const result = await db.transaction(async (tx) => {
    // INSERT ... ON CONFLICT DO NOTHING.
    // If the row already exists, this is a retry — skip the work.
    const inserted = await tx.execute(
      `insert into processed_events (id, source, type, payload)
       values ($1, 'stripe', $2, $3)
       on conflict (id) do nothing
       returning id`,
      [event.id, event.type, event],
    );
    if (inserted.length === 0) return "duplicate";

    await applyEffect(tx, event);
    return "processed";
  });

  return new Response(JSON.stringify({ result }), { status: 200 });
}

Why the transaction matters

Two things must happen atomically: (1) record that you processed the event, (2) apply the effect. If they're separate, you have a window where one happens and the other doesn't.

Concurrent retries

Stripe will sometimes send the same event in parallel — especially after their delivery infrastructure recovers from a hiccup. Two requests, same event id, both insert at the same time.

The on conflict do nothing handles this correctly. One insert wins (returns the id), the other loses (returns empty). The losing handler skips applyEffect. No double-processing.

For belt-and-suspenders, you can also select ... for update the row first:

const existing = await tx.execute(
  `select id from processed_events where id = $1 for update`,
  [event.id],
);
if (existing.length > 0) return "duplicate";
// ... insert and apply

But the on conflict approach is simpler and equally safe.

What "applying the effect" looks like

Each event type has a handler. The handlers can be naive about idempotency because the outer wrapper guarantees they run at-most-once.

async function applyEffect(tx: DbTx, event: Stripe.Event) {
  switch (event.type) {
    case "customer.subscription.created":
    case "customer.subscription.updated":
      return upsertSubscription(tx, event.data.object as Stripe.Subscription);
    case "customer.subscription.deleted":
      return cancelSubscription(tx, event.data.object as Stripe.Subscription);
    case "invoice.payment_succeeded":
      return creditUsage(tx, event.data.object as Stripe.Invoice);
    case "invoice.payment_failed":
      return flagFailure(tx, event.data.object as Stripe.Invoice);
    default:
      return; // unknown event type, ack and move on
  }
}

Note: even upsertSubscription doing an upsert is fine — the outer wrapper means we won't call it twice for the same event anyway.

The 200 response trick

What you don't want: returning 200 too eagerly. We've seen code like this in the wild:

// BAD — never do this
export async function POST(req: Request) {
  const event = parse(await req.text());
  setImmediate(() => applyEffect(event)); // fire and forget
  return new Response("ok"); // 200, but the work hasn't happened
}

This is broken. If applyEffect fails, Stripe doesn't know to retry because you already returned 200. Real production bug we've debugged at customer code reviews. Don't do this.

What about Stripe's own idempotency keys?

Testing idempotency

Two tests in CI:

test("processing the same event twice has the same effect as once", async () => {
  const event = makeStripeEvent({ id: "evt_test_1", type: "customer.subscription.created" });
  await handleEvent(event);
  await handleEvent(event); // second call should be a no-op
  const subs = await db.subscriptions.findMany();
  expect(subs.length).toBe(1);
});

test("concurrent processing of the same event still produces one effect", async () => {
  const event = makeStripeEvent({ id: "evt_test_2", type: "customer.subscription.created" });
  await Promise.all([handleEvent(event), handleEvent(event), handleEvent(event)]);
  const subs = await db.subscriptions.findMany();
  expect(subs.length).toBe(1);
});

The second test catches the "I forgot to use a transaction" bug.

Cleaning up processed_events

The table grows forever if you don't prune. Stripe's redelivery window is ~7 days. We keep 30 days of events for debugging and then trim:

sql

delete from processed_events
where processed_at < now() - interval '30 days';

Run nightly. Add a processed_at btree index if you have millions of events.

Other webhook sources, same pattern

This isn't Stripe-specific. The exact same pattern works for:

Telegram bot updates (use update_id)
WhatsApp Business API (use the message id)
GitHub webhooks (use the X-GitHub-Delivery header)
Slack events API (use event_id)
Shopify webhooks (use X-Shopify-Webhook-Id)
Discord interactions (use the interaction id)

Every one of them sends a unique id per event. Every one of them will retry. Every one of them needs this same five-line pattern.

We ship webhook integrations that don't double-fire.

Stripe, Paddle, Telegram, Discord, GitHub — same pattern, hardened in production across every customer integration.

See API integrations service

FAQ

What if my handler is fast — do I still need this?

Yes. "Fast" doesn't prevent retries. Network failures, client-side timeouts, infrastructure restarts — they all cause retries. The idempotency layer is the only correct defense.

Can I use Redis instead of Postgres?

What about Stripe's webhook signing — isn't that enough?

Signing verifies the event came from Stripe. It doesn't deduplicate retries. Those are different layers — you need both.

Does Stripe retry forever?

What's the right table column for the event id?

Use text not varchar. Stripe event ids are 30+ characters. Add primary key for the dedup constraint. Add processed_at for the cleanup job.

How do I handle out-of-order events?

What if my handler depends on a slow downstream API?

Don't do the slow work in the handler. Insert into processed_events, enqueue a job, return 200. The job does the slow work, with its own retries. Keeps your handler response time under 2 seconds.

Should I version the event payload?

If you serialize it into your warehouse for analytics, yes — schema changes over time. If you only use it for dedup, no — the unique id is all you need from a re-processing standpoint.

TagsWebhooks Stripe Idempotency Postgres Reliability

ServiceStripe Integration API Integration Services SaaS Development

Case studyCloudChat