Troubleshooting: Random 502 Errors Behind a Load Balancer
If you are getting intermittent 502 Bad Gateway responses from your load balancer while your application appears healthy, the cause is almost always a keep-alive timeout mismatch between your app and the load balancer.
Why This Happens
HTTP/1.1 connections are kept open (“keep-alive”) so multiple requests can reuse the same TCP socket. Both sides of the connection — your app server and the load balancer — have an idle timeout that controls how long an unused connection stays open.
The problem occurs when your app closes the connection before the load balancer does. The load balancer thinks the socket is still usable and sends a request down it, but the app has already torn it down. The load balancer receives a connection reset and returns a 502 to the client.
AWS ALB has a default idle timeout of 60 seconds. Framework defaults vary widely (some below 60, some above):
| Framework / Runtime | Default Keep-Alive Timeout |
|---|---|
| Node.js HTTP server | 5 secs |
| Next.js | 5 secs (inherits Node default) |
| Fastify | 72 secs |
| Puma (Rails) | 20 secs (persistent_timeout) |
| Gunicorn (Django) | 2 secs (keepalive) |
Any value lower than the ALB idle timeout can trigger 502s.
The Fix
Set your app’s keep-alive timeout higher than the load balancer idle timeout.
The simplest rule: if ALB idle timeout is 60 secs, set your app keep-alive to 65 secs (65000 ms) or more. This ensures the load balancer always closes idle connections before your app does.
Next.js
Next.js accepts a --keepAliveTimeout flag:
{
"scripts": {
"start": "next start --keepAliveTimeout 65000"
}
}Node.js (Express, custom HTTP server)
Set server.keepAliveTimeout after creating the HTTP server. You should also set server.headersTimeout to a value higher than keepAliveTimeout (Node.js defaults it to 60000ms):
const server = app.listen(process.env.PORT || 3000, () => {
console.log("listening")
})
// Must be higher than ALB idle timeout (default 60 secs)
server.keepAliveTimeout = 65000
// Must be higher than keepAliveTimeout
server.headersTimeout = 66000Fastify
Fastify already defaults to 72 secs, which is above ALB’s 60 secs. If you customize it, keep it above ALB idle timeout:
const fastify = require("fastify")({
keepAliveTimeout: 65000,
})Rails (Puma)
Puma’s default persistent_timeout is 20 secs, which is below ALB’s default. Set it above 60 when behind ALB:
# config/puma.rb
persistent_timeout 65Django (Gunicorn)
Gunicorn defaults to 2 secs. Raise it:
gunicorn myapp.wsgi --keep-alive 65Other Frameworks
The general principle applies to any framework: find the keep-alive / persistent-connection timeout setting, and make sure it is above your load balancer’s idle timeout.
Infrastructure Settings
Beyond the app-level fix, these ECS and ALB settings are a good baseline to avoid related issues during deployments and task shutdowns:
| Setting | Default | Recommended | Why |
|---|---|---|---|
| ALB idle timeout | 60 secs | 60 secs (keep default) | No change needed if app timeout is above this |
ECS stopTimeout | 30 secs | 90 secs | Gives containers time to drain in-flight requests during rolling deploys |
ALB target group deregistration_delay | 300 secs | 60 secs | Allows graceful drain without holding old tasks for 5 minutes |
Adjust these values for your actual request durations and shutdown behavior.