Return 503 before your Express route melts down

Most Express apps have one or two routes that are expensive: a search endpoint that hits Elasticsearch, a report generator that runs heavy queries, a payment flow that calls three external services. These routes are fine under normal traffic. Under load, they drag down everything else.

The typical approach is to add more capacity. But overload doesn't always mean "not enough servers." It often means one hot route is consuming all available connections, memory, or downstream capacity while healthcheck and login routes starve.

The simpler fix: stop admitting requests to the expensive route before it saturates.

Route-level bulkheads

async-bulkhead-express wraps a concurrency limit in Express middleware. When a protected route is full, new requests get a 503 Service Unavailable before the handler ever runs.

import { createBulkheadMiddleware } from 'async-bulkhead-express';

const searchBulkhead = createBulkheadMiddleware({
  name: 'search',
  maxConcurrent: 20,
  maxQueue: 0,
});

app.get('/search', searchBulkhead, async (_req, res) => {
  const results = await search();
  res.json(results);
});

If 20 search requests are already in flight, the 21st gets a 503 immediately. The handler never fires. Database connections aren't consumed. The rest of the app stays healthy.

Some routes should compete for the same capacity. Charge and refund both hit the payment provider: if the provider can handle 10 concurrent calls, both routes should share that limit.

import { createExpressBulkhead } from 'async-bulkhead-express';

const payments = createExpressBulkhead({
  name: 'payments',
  maxConcurrent: 10,
  maxQueue: 0,
});

app.post('/charge', payments.middleware(), chargeHandler);
app.post('/refund', payments.middleware(), refundHandler);

Ten total across both routes, not ten each.

When waiting is worth it

Fail-fast (maxQueue: 0) is the right default for user-facing routes. But some endpoints handle work where a brief wait is better than a rejection: report generation, file processing, batch operations.

const reports = createExpressBulkhead({
  name: 'reports',
  maxConcurrent: 4,
  maxQueue: 8,
  queueWaitTimeoutMs: 250,
});

This allows up to 8 requests to wait for a slot, but no longer than 250ms. After that, they get a 503 instead of waiting indefinitely.

This is the same tradeoff LoadLens visualizes: a bounded queue absorbs short bursts, but under sustained pressure it trades immediate rejection for rising latency. queueWaitTimeoutMs is your escape valve. It caps how much latency the queue can add.

Why 503 and not 429

429 Too Many Requests means the client is sending too much. 503 Service Unavailable means the server is busy. A bulkhead models server capacity, not client quotas. The distinction matters for clients that handle these codes differently: retry logic, load balancers, and monitoring all treat 429 and 503 as separate signals.

Skipping cheap routes

If you mount a bulkhead at the router level, you probably don't want it guarding healthchecks or CORS preflight.

const apiBulkhead = createExpressBulkhead({
  name: 'api',
  maxConcurrent: 50,
  maxQueue: 0,
  skip: (req) => req.path === '/healthz' || req.method === 'OPTIONS',
});

app.use('/api', apiBulkhead.middleware(), apiRouter);

Observability

Hook into admission, rejection, and release events for metrics without touching request flow.

const users = createExpressBulkhead({
  name: 'users',
  maxConcurrent: 15,
  maxQueue: 5,
  routeLabel: 'GET /users/:id',
  onReject(event) {
    metrics.increment('bulkhead.reject', {
      bulkhead: event.name,
      route: event.route,
      reason: event.reason,
    });
  },
});

Hooks are fire-and-forget: exceptions are swallowed so observability code can't break request handling. Alert on sustained rejection, not isolated bursts.

Graceful shutdown

When the process is stopping, you want to finish in-flight requests without admitting new ones.

const bulkhead = createExpressBulkhead({
  name: 'search',
  maxConcurrent: 20,
});

process.on('SIGTERM', async () => {
  bulkhead.close();
  await bulkhead.drain();
  server.close();
});

Where to start

Pick your most expensive route. Add a bulkhead with maxQueue: 0 and a maxConcurrent based on the downstream resource it depends on: database connection pool size, external API concurrency limit, or just a conservative guess. Watch the rejection rate. Adjust from there.

npm install async-bulkhead-express