CRITICAL

How to Fix Kubernetes 502 Bad Gateway with Istio Service Mesh (2025)

Quick Fix Summary

TL;DR

Check Istio Gateway and VirtualService configuration, then verify upstream pod health and Envoy proxy logs.

A 502 Bad Gateway in an Istio-meshed environment indicates the Envoy proxy (the gateway) received an invalid response from an upstream service it was trying to contact. This is a network-level failure between the proxy and your application backend.

Diagnosis & Causes

  • Upstream Pod is not ready or crashed.
  • Istio VirtualService destination points to wrong Service/Subset.
  • Gateway selector does not match the Istio ingress gateway pod.
  • Network Policy or AuthorizationPolicy blocking traffic.
  • Upstream service timeout or connection refusal.
  • Recovery Steps

    1

    Step 1: Verify Gateway and VirtualService Configuration

    Ensure your Gateway is bound to the correct ingress pod and your VirtualService routes to a valid Kubernetes Service and port.

    bash
    kubectl get gateway -n <your-namespace>
    kubectl describe virtualservice <vs-name> -n <your-namespace>
    kubectl get svc <target-service> -n <your-namespace>
    2

    Step 2: Check Upstream Pod Health and Readiness

    The 502 often originates from Envoy being unable to reach a healthy pod. Verify pods are Running, Ready, and have passed their readiness probes.

    bash
    kubectl get pods -n <your-namespace> -l app=<your-app-label>
    kubectl describe pod <pod-name> -n <your-namespace>
    kubectl logs <pod-name> -n <your-namespace> -c <container-name>
    3

    Step 3: Inspect Istio Ingress Gateway Pod Logs

    The Istio ingress gateway Envoy logs will show the exact upstream connection failure. Look for HTTP/502 codes and upstream_reset_reason.

    bash
    INGRESS_POD=$(kubectl get pod -n istio-system -l app=istio-ingressgateway -o jsonpath='{.items[0].metadata.name}')
    kubectl logs $INGRESS_POD -n istio-system --tail=50 | grep -A 5 -B 5 "502"
    kubectl logs $INGRESS_POD -n istio-system --tail=100 | grep -i "upstream_reset\|no_healthy_upstream"
    4

    Step 4: Check DestinationRule and Subset Configuration

    A misconfigured DestinationRule (e.g., wrong port, missing subset) or a VirtualService referencing a non-existent subset will cause 502s.

    bash
    kubectl get destinationrule -n <your-namespace>
    kubectl describe destinationrule <dr-name> -n <your-namespace>
    # Cross-reference the subset name in your VirtualService with the subsets defined here.
    5

    Step 5: Validate Network Policies and Istio Authorization

    Ensure no NetworkPolicy or Istio AuthorizationPolicy is denying traffic from the istio-ingressgateway or istio-proxy sidecars to your application pods.

    bash
    kubectl get networkpolicy -n <your-namespace>
    kubectl get authorizationpolicy -n <your-namespace>
    kubectl describe authorizationpolicy <policy-name> -n <your-namespace>
    6

    Step 6: Enable Envoy Access Logs and Debug (Advanced)

    If the cause is elusive, increase the logging level on the ingress gateway to capture detailed connection handshake and routing info.

    bash
    kubectl exec -n istio-system $INGRESS_POD -c istio-proxy -- pilot-agent request GET /logging?level=debug
    # After reproducing the error, check logs again.
    kubectl logs $INGRESS_POD -n istio-system -c istio-proxy --tail=100

    Architect's Pro Tip

    "For intermittent 502s, check if your application's readiness probe passes *after* the Istio sidecar is fully ready. Use `holdApplicationUntilProxyStarts: true` in the Istio mesh config to prevent race conditions."

    Frequently Asked Questions

    My pods are healthy, but I still get a 502. What's next?

    Use `istioctl proxy-config clusters <ingress-pod>.<namespace>` on your ingress gateway to verify Envoy's view of the upstream cluster health and endpoints. If the cluster shows '0/1 healthy', the issue is at the Envoy-to-pod network layer.

    Can a Pod Disruption Budget (PDB) cause a 502?

    Indirectly, yes. If a PDB prevents eviction during a node drain, but the node becomes unreachable, the pod is in a 'Terminating' state. Envoy may still try to route to it, causing 502s until the pod is forcefully terminated.

    Related Kubernetes Guides