Troubleshooting: Random 502 Errors Behind a Load Balancer

If you are getting intermittent 502 Bad Gateway responses from your load balancer while your application appears healthy, the cause is almost always a keep-alive timeout mismatch between your app and the load balancer.

Why This Happens

HTTP/1.1 connections are kept open (“keep-alive”) so multiple requests can reuse the same TCP socket. Both sides of the connection — your app server and the load balancer — have an idle timeout that controls how long an unused connection stays open.

The problem occurs when your app closes the connection before the load balancer does. The load balancer thinks the socket is still usable and sends a request down it, but the app has already torn it down. The load balancer receives a connection reset and returns a 502 to the client.

AWS ALB has a default idle timeout of 60 seconds. Framework defaults vary widely (some below 60, some above):

Framework / RuntimeDefault Keep-Alive Timeout
Node.js HTTP server5 secs
Next.js5 secs (inherits Node default)
Fastify72 secs
Puma (Rails)20 secs (persistent_timeout)
Gunicorn (Django)2 secs (keepalive)

Any value lower than the ALB idle timeout can trigger 502s.

The Fix

Set your app’s keep-alive timeout higher than the load balancer idle timeout.

The simplest rule: if ALB idle timeout is 60 secs, set your app keep-alive to 65 secs (65000 ms) or more. This ensures the load balancer always closes idle connections before your app does.

Next.js

Next.js accepts a --keepAliveTimeout flag:

package.json
{
  "scripts": {
    "start": "next start --keepAliveTimeout 65000"
  }
}

Node.js (Express, custom HTTP server)

Set server.keepAliveTimeout after creating the HTTP server. You should also set server.headersTimeout to a value higher than keepAliveTimeout (Node.js defaults it to 60000ms):

const server = app.listen(process.env.PORT || 3000, () => {
  console.log("listening")
})
 
// Must be higher than ALB idle timeout (default 60 secs)
server.keepAliveTimeout = 65000
// Must be higher than keepAliveTimeout
server.headersTimeout = 66000

Fastify

Fastify already defaults to 72 secs, which is above ALB’s 60 secs. If you customize it, keep it above ALB idle timeout:

const fastify = require("fastify")({
  keepAliveTimeout: 65000,
})

Rails (Puma)

Puma’s default persistent_timeout is 20 secs, which is below ALB’s default. Set it above 60 when behind ALB:

# config/puma.rb
persistent_timeout 65

Django (Gunicorn)

Gunicorn defaults to 2 secs. Raise it:

gunicorn myapp.wsgi --keep-alive 65

Other Frameworks

The general principle applies to any framework: find the keep-alive / persistent-connection timeout setting, and make sure it is above your load balancer’s idle timeout.

Infrastructure Settings

Beyond the app-level fix, these ECS and ALB settings are a good baseline to avoid related issues during deployments and task shutdowns:

SettingDefaultRecommendedWhy
ALB idle timeout60 secs60 secs (keep default)No change needed if app timeout is above this
ECS stopTimeout30 secs90 secsGives containers time to drain in-flight requests during rolling deploys
ALB target group deregistration_delay300 secs60 secsAllows graceful drain without holding old tasks for 5 minutes

Adjust these values for your actual request durations and shutdown behavior.