CRITICAL

Resolving 502 Bad Gateway Errors Blocking Your CI/CD Pipeline Deployments

Quick Fix Summary

TL;DR

Restart the upstream service or proxy (e.g., nginx, HAProxy) and verify backend health.

A 502 Bad Gateway error indicates that a reverse proxy or load balancer (the gateway) received an invalid response from an upstream server (your application). This blocks deployments by failing health checks.

Diagnosis & Causes

Upstream application crashed or is not listening.

Network connectivity or firewall issue between proxy and app.

Upstream server timed out or returned a malformed response.

Recovery Steps

Step 1: Verify Upstream Service Health

Check if your application pods/containers are running and ready. A failed liveness/readiness probe is a common culprit.

bash

# For Kubernetes:
kubectl get pods -n <namespace> --selector=app=<your-app>
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Events"
kubectl logs <pod-name> -n <namespace> --tail=50
# For Docker/Systemd:
docker ps | grep <your-app>
docker logs <container-id> --tail=50
sudo systemctl status <your-app-service>

Step 2: Test Direct Connectivity to Upstream

Bypass the proxy/gateway to confirm the application itself is reachable and responding correctly on its internal port.

bash

# Get the application's ClusterIP and Port (Kubernetes)
kubectl get svc -n <namespace> <your-app-service>
# Curl the service from within the cluster or its node
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl -v http://<service-cluster-ip>:<port>/health
# For a known host:port (e.g., on a VM)
curl -v --connect-timeout 5 http://<backend-host>:<app-port>/health

Step 3: Inspect and Restart the Gateway/Proxy

Examine proxy logs for connection errors (refused, timeout, reset). A restart can clear transient issues.

bash

# For nginx (common ingress):
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=100 | grep -i "502\|upstream"
# For HAProxy or other proxies on host:
sudo journalctl -u haproxy --since "5 minutes ago" -f
sudo tail -f /var/log/nginx/error.log
# Restart proxy (example for nginx on host):
sudo systemctl restart nginx

Step 4: Check Resource Constraints and Timeouts

Insufficient CPU/Memory or proxy timeouts set too low can cause 502s during deployment spikes.

bash

# Check for OOMKilled pods or high resource usage
kubectl describe pod <pod-name> -n <namespace> | grep -i "state\|oom\|limit"
kubectl top pods -n <namespace>
# Review proxy timeout configuration (example nginx ingress annotation):
kubectl get ingress <ingress-name> -n <namespace> -o yaml | grep -A 2 -B 2 "timeout"

Step 5: Validate Network Policies and Security Groups

A new deployment might be blocked by network policies (K8s) or cloud security groups denying traffic from the proxy.

bash

# List NetworkPolicies affecting your app namespace
kubectl get networkpolicy -n <namespace>
# Describe a specific policy
kubectl describe networkpolicy <policy-name> -n <namespace>
# For AWS, check Security Group of backend instances/ENIs:
aws ec2 describe-security-groups --group-ids <sg-id> --query 'SecurityGroups[0].IpPermissions'

Step 6: Rollback to Last Known Good Deployment

If the 502 started with the latest deployment, perform an immediate rollback to restore service while you debug.

bash

# Kubernetes rollout undo (Deployment)
kubectl rollout undo deployment/<deployment-name> -n <namespace>
# For Helm releases:
helm rollback <release-name> <previous-revision-number>

Architect's Pro Tip

"This often happens when a new deployment passes liveness probes but fails readiness probes due to slow startup (e.g., waiting for a database). The proxy routes traffic before the app is truly ready. Increase initialDelaySeconds for readiness probes."

Frequently Asked Questions

My app is healthy when I curl it directly, but the proxy returns 502. What now?

This points to a proxy configuration issue. Check: 1) Proxy's upstream definition points to the correct service/port. 2) Proxy's resolver can resolve the upstream service name (in K8s, use the internal DNS name). 3) Proxy client request timeouts are not too short for your app's response time.

The 502 error is intermittent. How do I troubleshoot?

Intermittent 502s suggest resource exhaustion or network flakiness. Monitor: 1) Backend application connection pool saturation. 2) Node/VM network bandwidth and error counters. 3) DNS resolution failures. Increase logging verbosity on the proxy and tail logs during the next occurrence.

Related HTTP Protocol Guides

429

Resolving 502 Bad Gateway Errors Blocking Your CI/CD Pipeline Deployments

Quick Fix Summary

Diagnosis & Causes

Recovery Steps

Step 1: Verify Upstream Service Health

Step 2: Test Direct Connectivity to Upstream

Step 3: Inspect and Restart the Gateway/Proxy

Step 4: Check Resource Constraints and Timeouts

Step 5: Validate Network Policies and Security Groups

Step 6: Rollback to Last Known Good Deployment

Architect's Pro Tip

Frequently Asked Questions

My app is healthy when I curl it directly, but the proxy returns 502. What now?

The 502 error is intermittent. How do I troubleshoot?

Related HTTP Protocol Guides

How to Fix HTTP 429 Too Many Requests Error

Root Cause Analysis: Why HTTP/2 Stream Reset (RST_STREAM) Happens

Troubleshooting Guide: HTTP/2 Connection Failures and GOAWAY Frames