← Journal

Your fetch calls don't know when to stop

Outbound HTTP calls have no built-in concurrency limit. That's fine until a downstream dependency slows down and your service drowns in open connections.

fetch will happily open as many connections as you ask for. There is no built-in limit. If 500 requests arrive and each one calls a downstream API, you get 500 outbound connections, whether the downstream can handle it or not.

This works fine when the downstream responds quickly. It falls apart the moment it slows down.

When response times go from 50ms to 5 seconds, those 500 connections don't close. New ones keep opening. Memory climbs. The connection pool fills. Event loop lag spikes. Your service becomes unresponsive, not because of anything it did, but because it didn't stop sending requests to a service that stopped answering them.

Adding a concurrency ceiling

async-bulkhead-fetch wraps the standard fetch API with a concurrency limit. It's a drop-in replacement with one difference: when capacity is full, it rejects the call before any network request is made.

import { createBulkheadFetch } from 'async-bulkhead-fetch';

const guardedFetch = createBulkheadFetch({
  name: 'payments-api',
  maxConcurrent: 10,
  maxQueue: 0,
});

const response = await guardedFetch('https://payments.example.com/charge', {
  method: 'POST',
  body: JSON.stringify(payload),
  headers: { 'content-type': 'application/json' },
});

Same signature as fetch. Same response object. The only thing that changes is that the 11th concurrent call throws instead of adding to the pile.

Why per-dependency, not global

A global concurrency limit is a blunt instrument. If Stripe is slow and your search API is fast, a global limit punishes search calls for Stripe's problem.

Each dependency gets its own bulkhead. Each bulkhead has its own capacity. One dependency slowing down cannot starve another.

const stripe = createFetchBulkhead({
  name: 'stripe',
  maxConcurrent: 8,
  maxQueue: 0,
});

const search = createFetchBulkhead({
  name: 'search-api',
  maxConcurrent: 20,
  maxQueue: 0,
});

Stripe filling up its 8 slots has zero effect on search. That's the isolation property.

Body-aware release

Most concurrency wrappers release the slot when fetch() resolves. But fetch() resolves when headers arrive. The response body might still be streaming.

If you release on headers and the body takes 3 seconds to stream, your concurrency counter says "7 in flight" while there are actually 8 open connections consuming resources.

async-bulkhead-fetch holds the slot until the body is consumed by default.

const response = await guardedFetch('https://api.example.com/data');
// Slot still held.
const data = await response.json();
// Slot released.

For calls where you don't need the body, you can release early:

await guardedFetch(url, init, { releaseOn: 'headers' });

Absorbing short bursts

Fail-fast is the right default. But some dependencies handle traffic in bursts where a 100ms wait avoids an unnecessary rejection.

const analytics = createFetchBulkhead({
  name: 'analytics-api',
  maxConcurrent: 4,
  maxQueue: 8,
  queueWaitTimeoutMs: 250,
});

The queue holds up to 8 calls. Each one waits a maximum of 250ms for a slot. If the burst resolves within that window, no rejections. If it doesn't, the timeout fires and the call is rejected before it becomes a slow "no."

This is the tradeoff LoadLens makes visible: bounded queues absorb spikes but can't absorb sustained overload. The queueWaitTimeoutMs is the boundary between "helpful buffer" and "latency trap."

Catching rejections

Rejected calls throw a typed error with a reason you can use for logging, metrics, or fallback logic.

import { FetchBulkheadRejectedError } from 'async-bulkhead-fetch';

try {
  return await guardedFetch(url);
} catch (err) {
  if (err instanceof FetchBulkheadRejectedError) {
    logger.warn('downstream busy', {
      dependency: 'stripe',
      reason: err.reason,
    });
    return fallback();
  }
  throw err;
}

Five possible reasons: concurrency_limit, queue_limit, timeout, aborted, shutdown. Each one tells you something different about why the call didn't go out.

Shutting down cleanly

When the process exits, you want in-flight calls to finish without new ones being admitted.

process.on('SIGTERM', async () => {
  stripe.close();
  await stripe.drain();
  process.exit(0);
});

close() is synchronous and immediate. It blocks all future admission and rejects anyone waiting in the queue. drain() resolves when every in-flight call has completed.

Starting point

Pick the dependency that has caused the most pain. Wrap its fetch calls. Set maxConcurrent to something conservative. Set maxQueue: 0. Deploy. Watch whether rejections spike or stay near zero. Tune from there.

npm install async-bulkhead-fetch