CRITICAL

GCP Cloud SQL Instance Disk Full: Troubleshooting Guide

Quick Fix Summary

TL;DR

Immediately increase disk size via GCP Console or gcloud, then identify and clean up space-consuming data.

The Cloud SQL instance has exhausted its allocated storage capacity, preventing new writes and potentially causing application failures. This is a critical operational state requiring immediate remediation to restore database functionality.

Diagnosis & Causes

  • Unbounded table growth without data lifecycle management.
  • Excessive binary logging or transaction log retention.
  • Lack of monitoring alerts for disk utilization trends.
  • Bulk data imports or ETL jobs without cleanup.
  • Temporary files or orphaned data from failed operations.
  • Recovery Steps

    1

    Step 1: Immediate Disk Expansion (Stopgap)

    Quickly provision additional storage to restore write operations while you diagnose the root cause. This is a non-destructive, online operation.

    bash
    gcloud sql instances patch [INSTANCE_NAME] --storage-size=[NEW_SIZE_GB]
    # Verify the operation initiated
    gcloud sql operations list --instance=[INSTANCE_NAME] --filter="STATUS=PENDING"
    2

    Step 2: Diagnose Largest Tables & Objects

    Connect to the database and identify which tables, indexes, or binary logs are consuming the most space. Use PostgreSQL or MySQL specific queries.

    sql
    -- For PostgreSQL
    SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size
    FROM pg_tables
    ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
    LIMIT 10;
    -- For MySQL
    SELECT table_schema as 'Database', table_name AS 'Table',
    ROUND(((data_length + index_length) / 1024 / 1024), 2) AS 'Size (MB)'
    FROM information_schema.TABLES
    ORDER BY (data_length + index_length) DESC
    LIMIT 10;
    3

    Step 3: Analyze and Clean Binary Logs/Transaction Logs

    For MySQL instances, binary logs can accumulate. For PostgreSQL, check WAL and temporary files. Adjust retention settings.

    bash
    # Check binary log size and count (MySQL via gcloud)
    gcloud sql instances describe [INSTANCE_NAME] --format="value(settings.backupConfiguration.binaryLogEnabled,settings.backupConfiguration.transactionLogRetentionDays)"
    # To reduce retention (example: set to 3 days) - CAUTION: Affects PITR
    gcloud sql instances patch [INSTANCE_NAME] --backup-start-time=HH:MM --transaction-log-retention-days=3
    # For PostgreSQL, check pg_wal directory size via Cloud SQL Insights or logs.
    4

    Step 4: Implement Data Archiving or Purging

    Safely delete or archive old data based on your diagnosis. Always backup before major deletions. Use batched DELETE statements to avoid long transactions.

    sql
    -- Example: Archive and delete records older than 90 days (batched)
    BEGIN;
    CREATE TABLE audit_log_archive AS SELECT * FROM audit_log WHERE created_at < NOW() - INTERVAL '90 days';
    DELETE FROM audit_log WHERE id IN (SELECT id FROM audit_log WHERE created_at < NOW() - INTERVAL '90 days' LIMIT 10000);
    COMMIT;
    -- Follow with VACUUM (PostgreSQL) or OPTIMIZE TABLE (MySQL)
    5

    Step 5: Enable and Configure Proactive Monitoring

    Set up Cloud Monitoring alerts to trigger well before the disk reaches capacity, allowing for planned intervention.

    bash
    gcloud alpha monitoring policies create --policy-from-file="policy.json"
    # policy.json example: Alert at 85% disk usage.
    {"displayName": "Cloud SQL Disk High Usage", "conditions": [{"conditionThreshold": {"filter": "metric.type=\"cloudsql.googleapis.com/database/disk/utilization\" resource.type=\"cloudsql_database\"", "comparison": "COMPARISON_GT", "thresholdValue": 0.85, "duration": "60s"}}], "combiner": "OR"}
    6

    Step 6: Review and Implement Automated Storage Increase

    For critical production instances, enable Storage Auto-increase to prevent future outages. This is a preventative setting.

    bash
    gcloud sql instances patch [INSTANCE_NAME] --storage-auto-increase

    Architect's Pro Tip

    "For MySQL, the 'information_schema.TABLES' size data is an approximation. Run 'ANALYZE TABLE' on large tables first for accurate metrics before deciding what to purge."

    Frequently Asked Questions

    Does increasing disk size cause downtime?

    No. Increasing the storage size of a Cloud SQL instance is an online operation and does not require a restart or cause downtime for most instances.

    Can I decrease the disk size after fixing the issue?

    No. Google Cloud SQL does not support decreasing the allocated storage size. You can only increase it. This makes proper capacity planning and data lifecycle management essential.

    What's the difference between 'data size' and 'disk size' in Cloud SQL metrics?

    'Data size' is the space used by your actual database. 'Disk size' is the total provisioned storage, which also includes binary logs, temporary files, and filesystem overhead. The disk utilization alert triggers on the latter.

    Related GCP Guides