p-limit doesn't tell you why it said no

Every Node.js backend eventually needs a concurrency limit somewhere. Database calls, LLM requests, downstream APIs, CPU-heavy work. The standard answer is p-limit or Bottleneck. They work. But they solve a different problem than the one that wakes you up at 2am.

The problem they solve: "run at most N things at a time." The problem they don't solve: "what happens to thing N+1?"

With p-limit, thing N+1 waits. Silently. Indefinitely. If 1,000 requests arrive and your limit is 10, you get 10 running and 990 waiting in an unbounded internal queue. Memory climbs. Latency climbs. The caller has no idea whether their request will run in 50ms or 50 seconds.

That's not concurrency limiting. That's hiding overload.

What a bulkhead does differently

async-bulkhead-ts is a concurrency primitive that treats rejection as a feature, not a failure.

import { createBulkhead } from 'async-bulkhead-ts';

const bulkhead = createBulkhead({
  maxConcurrent: 10,
});

const result = await bulkhead.acquire();

if (!result.ok) {
  // result.reason: 'concurrency_limit' | 'queue_limit'
  //              | 'timeout' | 'aborted' | 'shutdown'
  return res.status(503).json({ error: result.reason });
}

try {
  await doExpensiveWork();
} finally {
  result.token.release();
}

When all 10 slots are occupied, the 11th call is rejected immediately. Not queued. Not delayed. Rejected with a typed reason that tells you exactly what happened.

The caller gets a fast answer. Your service stays responsive. Your dashboard shows the rejection. You can act on it.

Why typed rejections matter

p-limit gives you a promise that eventually resolves. If it takes too long, you don't know whether the delay is from the work itself or from waiting in a hidden queue. There's no signal to distinguish "the downstream is slow" from "200 requests are ahead of you."

async-bulkhead-ts gives you a reason with every rejection:

concurrency_limit: all slots are full, no queue configured
queue_limit: all slots full and the queue is also full
timeout: the request waited in the queue too long
aborted: the caller cancelled via AbortSignal
shutdown: the bulkhead was closed for graceful termination

Each reason maps to a different operational response. A concurrency_limit spike means you need more capacity or less traffic. A timeout spike means your queue is too deep or your work is too slow. A shutdown rejection is expected during deploys.

Opting into a queue

Waiting is opt-in, bounded, and comes with an escape hatch.

const bulkhead = createBulkhead({
  maxConcurrent: 10,
  maxQueue: 20,
});

await bulkhead.run(async () => doWork(), { timeoutMs: 50 });

At most 20 requests wait. Each one waits at most 50ms. If the queue is full or the timeout fires, the request is rejected with a reason, not left hanging.

This is the tradeoff LoadLens visualizes: a bounded queue absorbs short bursts, but under sustained load it adds latency without reducing rejection. The timeout keeps the queue from becoming a latency trap.

Synchronous fail-fast

When you can't afford even the overhead of a promise, tryAcquire gives a synchronous answer.

const result = bulkhead.tryAcquire();

if (!result.ok) {
  return res.status(503).send('busy');
}

try {
  await handleRequest();
} finally {
  result.token.release();
}

No async. No queue check. Either a slot is free right now or the request is rejected. This is the path you want for hot routes where every microsecond of admission overhead matters.

Observability built in

Hooks fire synchronously on admission, rejection, release, and close. They're designed for metrics export, not control flow.

const bulkhead = createBulkhead({
  name: 'search',
  maxConcurrent: 20,
  hooks: {
    onReject(event) {
      metrics.increment('bulkhead.reject', {
        bulkhead: event.name,
        reason: event.reason,
      });
    },
  },
});

Hook exceptions are swallowed and counted in stats().hookErrors. Observability code cannot break request handling.

stats() gives you the full picture at any point: inFlight, pending, totalAdmitted, totalReleased, rejected, rejectedByReason. All pure reads, no side effects.

Graceful shutdown

When your process receives SIGTERM, you want to stop admitting new work while letting in-flight work finish.

bulkhead.close();
await bulkhead.drain();
process.exit(0);

close() is synchronous and immediate. It rejects everyone waiting in the queue with reason shutdown and blocks all future admission. drain() resolves when every in-flight token has been released.

What it doesn't do

No retries. No circuit breaking. No scheduling. No persistence. No distributed coordination.

Those are real concerns, but they belong in separate layers. A bulkhead answers one question: should this work start right now? Everything else composes around it.

npm install async-bulkhead-ts