CRITICAL

GCP Cloud SQL Instance Disk Full: Troubleshooting Guide

Quick Fix Summary

TL;DR

Immediately increase disk size via GCP Console or gcloud, then identify and clean up space-consuming data.

The Cloud SQL instance has exhausted its allocated storage capacity, preventing new writes and potentially causing application failures. This is a critical operational state requiring immediate remediation to restore database functionality.

Diagnosis & Causes

Unbounded table growth without data lifecycle management.

Excessive binary logging or transaction log retention.

Lack of monitoring alerts for disk utilization trends.

Bulk data imports or ETL jobs without cleanup.

Temporary files or orphaned data from failed operations.

Recovery Steps

Step 1: Immediate Disk Expansion (Stopgap)

Quickly provision additional storage to restore write operations while you diagnose the root cause. This is a non-destructive, online operation.

bash

gcloud sql instances patch [INSTANCE_NAME] --storage-size=[NEW_SIZE_GB]
# Verify the operation initiated
gcloud sql operations list --instance=[INSTANCE_NAME] --filter="STATUS=PENDING"

Step 2: Diagnose Largest Tables & Objects

Connect to the database and identify which tables, indexes, or binary logs are consuming the most space. Use PostgreSQL or MySQL specific queries.

sql

-- For PostgreSQL
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
-- For MySQL
SELECT table_schema as 'Database', table_name AS 'Table',
ROUND(((data_length + index_length) / 1024 / 1024), 2) AS 'Size (MB)'
FROM information_schema.TABLES
ORDER BY (data_length + index_length) DESC
LIMIT 10;

Step 3: Analyze and Clean Binary Logs/Transaction Logs

For MySQL instances, binary logs can accumulate. For PostgreSQL, check WAL and temporary files. Adjust retention settings.

bash

# Check binary log size and count (MySQL via gcloud)
gcloud sql instances describe [INSTANCE_NAME] --format="value(settings.backupConfiguration.binaryLogEnabled,settings.backupConfiguration.transactionLogRetentionDays)"
# To reduce retention (example: set to 3 days) - CAUTION: Affects PITR
gcloud sql instances patch [INSTANCE_NAME] --backup-start-time=HH:MM --transaction-log-retention-days=3
# For PostgreSQL, check pg_wal directory size via Cloud SQL Insights or logs.

Step 4: Implement Data Archiving or Purging

Safely delete or archive old data based on your diagnosis. Always backup before major deletions. Use batched DELETE statements to avoid long transactions.

sql

-- Example: Archive and delete records older than 90 days (batched)
BEGIN;
CREATE TABLE audit_log_archive AS SELECT * FROM audit_log WHERE created_at < NOW() - INTERVAL '90 days';
DELETE FROM audit_log WHERE id IN (SELECT id FROM audit_log WHERE created_at < NOW() - INTERVAL '90 days' LIMIT 10000);
COMMIT;
-- Follow with VACUUM (PostgreSQL) or OPTIMIZE TABLE (MySQL)

Step 5: Enable and Configure Proactive Monitoring

Set up Cloud Monitoring alerts to trigger well before the disk reaches capacity, allowing for planned intervention.

bash

gcloud alpha monitoring policies create --policy-from-file="policy.json"
# policy.json example: Alert at 85% disk usage.
{"displayName": "Cloud SQL Disk High Usage", "conditions": [{"conditionThreshold": {"filter": "metric.type=\"cloudsql.googleapis.com/database/disk/utilization\" resource.type=\"cloudsql_database\"", "comparison": "COMPARISON_GT", "thresholdValue": 0.85, "duration": "60s"}}], "combiner": "OR"}

Step 6: Review and Implement Automated Storage Increase

For critical production instances, enable Storage Auto-increase to prevent future outages. This is a preventative setting.

bash

gcloud sql instances patch [INSTANCE_NAME] --storage-auto-increase

Architect's Pro Tip

"For MySQL, the 'information_schema.TABLES' size data is an approximation. Run 'ANALYZE TABLE' on large tables first for accurate metrics before deciding what to purge."

Frequently Asked Questions

Does increasing disk size cause downtime?

No. Increasing the storage size of a Cloud SQL instance is an online operation and does not require a restart or cause downtime for most instances.

Can I decrease the disk size after fixing the issue?

No. Google Cloud SQL does not support decreasing the allocated storage size. You can only increase it. This makes proper capacity planning and data lifecycle management essential.

What's the difference between 'data size' and 'disk size' in Cloud SQL metrics?

'Data size' is the space used by your actual database. 'Disk size' is the total provisioned storage, which also includes binary logs, temporary files, and filesystem overhead. The disk utilization alert triggers on the latter.

Related GCP Guides

Cloud-Run-429

GCP Cloud SQL Instance Disk Full: Troubleshooting Guide

Quick Fix Summary

Diagnosis & Causes

Recovery Steps

Step 1: Immediate Disk Expansion (Stopgap)

Step 2: Diagnose Largest Tables & Objects

Step 3: Analyze and Clean Binary Logs/Transaction Logs

Step 4: Implement Data Archiving or Purging

Step 5: Enable and Configure Proactive Monitoring

Step 6: Review and Implement Automated Storage Increase

Architect's Pro Tip

Frequently Asked Questions

Does increasing disk size cause downtime?

Can I decrease the disk size after fixing the issue?

What's the difference between 'data size' and 'disk size' in Cloud SQL metrics?

Related GCP Guides

How to Fix GCP Cloud-Run-429

How to Fix GCP Container Startup Timeout

How to Fix GCP IAM PERMISSION_DENIED Error