CRITICAL

How to Fix GCP RESOURCE_EXHAUSTED: Quota Limits

Quick Fix Summary

TL;DR

Immediately request a quota increase via the GCP Console and scale down non-critical workloads.

The RESOURCE_EXHAUSTED error occurs when a Google Cloud service hits its quota limit, preventing new resource creation or API calls. This is a hard stop that will cause service disruption until resolved.

Diagnosis & Causes

  • Exceeding regional CPU, IP, or disk quotas.
  • Hitting API request rate limits (e.g., Google Kubernetes Engine).
  • Surpassing project-level global resource quotas.
  • Spike in traffic or autoscaling without headroom.
  • New deployment in a region with existing high usage.
  • Recovery Steps

    1

    Step 1: Identify the Exhausted Quota

    Use the Cloud Console or gcloud to pinpoint the exact quota that is exhausted. This is critical for a targeted fix.

    bash
    # Check quota metrics in Cloud Monitoring
    gcloud alpha services quota list --service=compute.googleapis.com --consumer=projects/YOUR_PROJECT_ID --filter="metric=quotas/cpus"
    gcloud compute regions describe us-central1 --format="value(quotas)"
    2

    Step 2: Request an Immediate Quota Increase

    Submit a quota increase request for the specific metric and region. For Critical Severity, check the 'Emergency' or 'Support Case' box.

    bash
    # Navigate to: IAM & Admin -> Quotas in the Cloud Console.
    # Filter by region and metric, select the quota, and click 'EDIT QUOTAS'.
    # For programmatic requests (where available):
    gcloud alpha services quota increase --service=compute.googleapis.com --consumer=projects/YOUR_PROJECT_ID --metric=quotas/cpus --unit=1 --value=NEW_VALUE --region=us-central1
    3

    Step 3: Implement Immediate Mitigation

    While waiting for the quota increase, reduce demand to restore service. This is a production triage step.

    bash
    # Scale down non-production GKE node pools.
    gcloud container clusters resize CLUSTER_NAME --node-pool=NON_PROD_POOL --num-nodes=1 --region=us-central1
    # Delete unused persistent disks or VM instances.
    gcloud compute instances delete old-instance-1 --zone=us-central1-a --quiet
    # Temporarily disable non-essential cron jobs or batch processes.
    4

    Step 4: Configure Proactive Quota Monitoring & Alerts

    Prevent future outages by creating Cloud Monitoring alerts for quota usage.

    bash
    # Create a Monitoring Policy via Console or Terraform. Example MQL:
    fetch consumer_quota
    metric 'serviceruntime.googleapis.com/quota/allocation/usage'
    filter (resource.service == 'compute.googleapis.com')&&(resource.location == 'us-central1')&&(metric.quota_metric == 'cpus')
    group_by [metric.quota_metric, resource.location]
    every 1m
    condition ratio > 0.8 '10^2.%'
    5

    Step 5: Architect for Quota Resilience

    Design your infrastructure to be tolerant of regional quota limits.

    bash
    # Use multi-region deployments (GKE clusters, Cloud SQL replicas).
    # Implement circuit breakers & graceful degradation in application code.
    # Use Resource Quotas in GKE to prevent namespace overconsumption.
    kubectl create quota prod-quota --hard=cpu=20,memory=64Gi,pods=50 --namespace=production

    Architect's Pro Tip

    "For 'global' quotas (e.g., Global Static External IPs), increases can take 48+ hours. Always pre-request quotas for planned scaling events via the Quotas API."

    Frequently Asked Questions

    How long does a GCP quota increase take?

    Standard requests take 24-48 hours. For a production outage, select 'Emergency' when submitting and simultaneously open a Priority P1 support case to expedite.

    Can I automate quota increase requests?

    Partially. The `gcloud alpha services quota` commands allow programmatic management, but final approval often requires manual review by Google.

    What's the difference between rate quotas and allocation quotas?

    Rate quotas (e.g., API calls/minute) reset over time. Allocation quotas (e.g., number of CPUs) are a hard cap until increased. RESOURCE_EXHAUSTED typically refers to allocation quotas.

    Related GCP Guides