How to Fix GCP RESOURCE_EXHAUSTED: Quota Limits
Quick Fix Summary
TL;DRImmediately request a quota increase via the GCP Console and scale down non-critical workloads.
The RESOURCE_EXHAUSTED error occurs when a Google Cloud service hits its quota limit, preventing new resource creation or API calls. This is a hard stop that will cause service disruption until resolved.
Diagnosis & Causes
Recovery Steps
Step 1: Identify the Exhausted Quota
Use the Cloud Console or gcloud to pinpoint the exact quota that is exhausted. This is critical for a targeted fix.
# Check quota metrics in Cloud Monitoring
gcloud alpha services quota list --service=compute.googleapis.com --consumer=projects/YOUR_PROJECT_ID --filter="metric=quotas/cpus"
gcloud compute regions describe us-central1 --format="value(quotas)" Step 2: Request an Immediate Quota Increase
Submit a quota increase request for the specific metric and region. For Critical Severity, check the 'Emergency' or 'Support Case' box.
# Navigate to: IAM & Admin -> Quotas in the Cloud Console.
# Filter by region and metric, select the quota, and click 'EDIT QUOTAS'.
# For programmatic requests (where available):
gcloud alpha services quota increase --service=compute.googleapis.com --consumer=projects/YOUR_PROJECT_ID --metric=quotas/cpus --unit=1 --value=NEW_VALUE --region=us-central1 Step 3: Implement Immediate Mitigation
While waiting for the quota increase, reduce demand to restore service. This is a production triage step.
# Scale down non-production GKE node pools.
gcloud container clusters resize CLUSTER_NAME --node-pool=NON_PROD_POOL --num-nodes=1 --region=us-central1
# Delete unused persistent disks or VM instances.
gcloud compute instances delete old-instance-1 --zone=us-central1-a --quiet
# Temporarily disable non-essential cron jobs or batch processes. Step 4: Configure Proactive Quota Monitoring & Alerts
Prevent future outages by creating Cloud Monitoring alerts for quota usage.
# Create a Monitoring Policy via Console or Terraform. Example MQL:
fetch consumer_quota
metric 'serviceruntime.googleapis.com/quota/allocation/usage'
filter (resource.service == 'compute.googleapis.com')&&(resource.location == 'us-central1')&&(metric.quota_metric == 'cpus')
group_by [metric.quota_metric, resource.location]
every 1m
condition ratio > 0.8 '10^2.%' Step 5: Architect for Quota Resilience
Design your infrastructure to be tolerant of regional quota limits.
# Use multi-region deployments (GKE clusters, Cloud SQL replicas).
# Implement circuit breakers & graceful degradation in application code.
# Use Resource Quotas in GKE to prevent namespace overconsumption.
kubectl create quota prod-quota --hard=cpu=20,memory=64Gi,pods=50 --namespace=production Architect's Pro Tip
"For 'global' quotas (e.g., Global Static External IPs), increases can take 48+ hours. Always pre-request quotas for planned scaling events via the Quotas API."
Frequently Asked Questions
How long does a GCP quota increase take?
Standard requests take 24-48 hours. For a production outage, select 'Emergency' when submitting and simultaneously open a Priority P1 support case to expedite.
Can I automate quota increase requests?
Partially. The `gcloud alpha services quota` commands allow programmatic management, but final approval often requires manual review by Google.
What's the difference between rate quotas and allocation quotas?
Rate quotas (e.g., API calls/minute) reset over time. Allocation quotas (e.g., number of CPUs) are a hard cap until increased. RESOURCE_EXHAUSTED typically refers to allocation quotas.