CRITICAL

Resolving 502 Bad Gateway Errors Blocking Your CI/CD Pipeline Deployments

Quick Fix Summary

TL;DR

Restart the upstream service or proxy (e.g., nginx, HAProxy) and verify backend health.

A 502 Bad Gateway error indicates that a reverse proxy or load balancer (the gateway) received an invalid response from an upstream server (your application). This blocks deployments by failing health checks.

Diagnosis & Causes

  • Upstream application crashed or is not listening.
  • Network connectivity or firewall issue between proxy and app.
  • Upstream server timed out or returned a malformed response.
  • Recovery Steps

    1

    Step 1: Verify Upstream Service Health

    Check if your application pods/containers are running and ready. A failed liveness/readiness probe is a common culprit.

    bash
    # For Kubernetes:
    kubectl get pods -n <namespace> --selector=app=<your-app>
    kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Events"
    kubectl logs <pod-name> -n <namespace> --tail=50
    # For Docker/Systemd:
    docker ps | grep <your-app>
    docker logs <container-id> --tail=50
    sudo systemctl status <your-app-service>
    2

    Step 2: Test Direct Connectivity to Upstream

    Bypass the proxy/gateway to confirm the application itself is reachable and responding correctly on its internal port.

    bash
    # Get the application's ClusterIP and Port (Kubernetes)
    kubectl get svc -n <namespace> <your-app-service>
    # Curl the service from within the cluster or its node
    kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl -v http://<service-cluster-ip>:<port>/health
    # For a known host:port (e.g., on a VM)
    curl -v --connect-timeout 5 http://<backend-host>:<app-port>/health
    3

    Step 3: Inspect and Restart the Gateway/Proxy

    Examine proxy logs for connection errors (refused, timeout, reset). A restart can clear transient issues.

    bash
    # For nginx (common ingress):
    kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=100 | grep -i "502\|upstream"
    # For HAProxy or other proxies on host:
    sudo journalctl -u haproxy --since "5 minutes ago" -f
    sudo tail -f /var/log/nginx/error.log
    # Restart proxy (example for nginx on host):
    sudo systemctl restart nginx
    4

    Step 4: Check Resource Constraints and Timeouts

    Insufficient CPU/Memory or proxy timeouts set too low can cause 502s during deployment spikes.

    bash
    # Check for OOMKilled pods or high resource usage
    kubectl describe pod <pod-name> -n <namespace> | grep -i "state\|oom\|limit"
    kubectl top pods -n <namespace>
    # Review proxy timeout configuration (example nginx ingress annotation):
    kubectl get ingress <ingress-name> -n <namespace> -o yaml | grep -A 2 -B 2 "timeout"
    5

    Step 5: Validate Network Policies and Security Groups

    A new deployment might be blocked by network policies (K8s) or cloud security groups denying traffic from the proxy.

    bash
    # List NetworkPolicies affecting your app namespace
    kubectl get networkpolicy -n <namespace>
    # Describe a specific policy
    kubectl describe networkpolicy <policy-name> -n <namespace>
    # For AWS, check Security Group of backend instances/ENIs:
    aws ec2 describe-security-groups --group-ids <sg-id> --query 'SecurityGroups[0].IpPermissions'
    6

    Step 6: Rollback to Last Known Good Deployment

    If the 502 started with the latest deployment, perform an immediate rollback to restore service while you debug.

    bash
    # Kubernetes rollout undo (Deployment)
    kubectl rollout undo deployment/<deployment-name> -n <namespace>
    # For Helm releases:
    helm rollback <release-name> <previous-revision-number>

    Architect's Pro Tip

    "This often happens when a new deployment passes liveness probes but fails readiness probes due to slow startup (e.g., waiting for a database). The proxy routes traffic before the app is truly ready. Increase initialDelaySeconds for readiness probes."

    Frequently Asked Questions

    My app is healthy when I curl it directly, but the proxy returns 502. What now?

    This points to a proxy configuration issue. Check: 1) Proxy's upstream definition points to the correct service/port. 2) Proxy's resolver can resolve the upstream service name (in K8s, use the internal DNS name). 3) Proxy client request timeouts are not too short for your app's response time.

    The 502 error is intermittent. How do I troubleshoot?

    Intermittent 502s suggest resource exhaustion or network flakiness. Monitor: 1) Backend application connection pool saturation. 2) Node/VM network bandwidth and error counters. 3) DNS resolution failures. Increase logging verbosity on the proxy and tail logs during the next occurrence.

    Related HTTP Protocol Guides