ERROR

Alibaba Cloud RDS: Fix Connection Pool Exhaustion Triggering High-Latency Alerts

Quick Fix Summary

TL;DR

Restart the application to clear stuck connections and temporarily relieve pool pressure.

The application's database connection pool has reached its maximum limit, causing new requests to wait or fail, which manifests as high latency and timeouts.

Diagnosis & Causes

Application not closing database connections properly (leaks).

Connection pool size is configured too low for the current workload.

Recovery Steps

Step 1: Verify Active Connections and Pool Exhaustion

Check the current number of active connections on the RDS instance and correlate with application metrics to confirm pool exhaustion.

bash

# Check active connections on the RDS instance via DMS or CLI
aliyun rds DescribeDBInstancePerformance --DBInstanceId <your-instance-id> --Key 'MySQL_ActiveSessions' --StartTime $(date -d '5 minutes ago' +%Y-%m-%dT%H:%M:%SZ) --EndTime $(date +%Y-%m-%dT%H:%M:%SZ)
# Check CloudMonitor for 'Database Connections' metric and 'Failed to Get Connection' alerts.

Step 2: Restart Application to Clear Stale Connections

Immediate recovery step to kill all existing connections and restart the application pool. Use your orchestration tool.

bash

# For ECS instances running the app
sudo systemctl restart <your-application-service>
# For Kubernetes deployments
kubectl rollout restart deployment/<your-app-deployment> -n <namespace>

Step 3: Increase RDS Maximum Connections Limit

Temporarily increase the instance's `max_connections` parameter to provide immediate headroom while you fix the root cause.

bash

# Modify the RDS parameter group (replace with your group ID and instance ID)
aliyun rds ModifyParameter --ParameterGroupId <pg-id> --Parameters '[{"Name":"max_connections", "Value":"1000"}]'
# Apply the parameter group to the instance
aliyun rds ModifyDBInstanceParameter --DBInstanceId <your-instance-id> --Parameters '[{"Name":"max_connections", "Value":"1000"}]' --Forcerestart false

Step 4: Kill Long-Running or Idle Connections

Manually terminate problematic connections consuming pool slots using the RDS DMS or SQL commands.

sql

# Connect to the RDS instance via DMS or client, then run:
SHOW PROCESSLIST;
-- Identify idle/long-running connections and kill them
KILL <process_id>;

Step 5: Optimize Application Connection Pool Settings

Review and adjust the application's connection pool configuration (e.g., HikariCP, Druid) to better match the RDS `max_connections` limit.

yaml

# Example: Check application config for pool settings (Spring Boot HikariCP)
# application.yml snippet:
spring:
  datasource:
    hikari:
      maximum-pool-size: 20 # Adjust based on RDS max_connections and app instances
      connection-timeout: 30000
      idle-timeout: 600000
      max-lifetime: 1800000

Step 6: Scale the RDS Instance

If the workload has permanently increased, scale the instance to a higher specification (CPU/Memory) which supports more connections.

bash

# Upgrade the DB instance specification (causes a momentary connection drop)
aliyun rds ModifyDBInstanceSpec --DBInstanceId <your-instance-id> --DBInstanceClass '<new-spec>' --PayType Postpaid

Step 7: Implement Connection Health Checks & Timeouts

Configure the connection pool to validate connections before use and evict idle connections to prevent leaks from affecting the pool.

properties

# Example HikariCP health check properties
spring.datasource.hikari.connection-test-query=SELECT 1
spring.datasource.hikari.validation-timeout=5000

Architect's Pro Tip

"This often happens after a deployment where a code change introduced a connection leak, or when a downstream service slowdown causes transactions to hold connections open for much longer than usual. Always compare the deployment timeline with the connection growth chart in CloudMonitor."

Frequently Asked Questions

Will increasing `max_connections` on RDS impact performance?

Yes, increasing it too high without sufficient instance resources (CPU/RAM) can lead to contention and degrade performance. It's a temporary buffer, not a solution. Always scale the instance spec if you need a permanently higher limit.

How do I find the root cause of connection leaks?

Enable detailed logging for your connection pool (e.g., HikariCP leak detection). Use APM tools to trace slow database queries. Review recent code changes, especially around transaction management and connection handling in try-catch-finally blocks.

Related Alibaba Cloud Guides

ACK-NodePool-ErrImagePull

Alibaba Cloud RDS: Fix Connection Pool Exhaustion Triggering High-Latency Alerts

Quick Fix Summary

Diagnosis & Causes

Recovery Steps

Step 1: Verify Active Connections and Pool Exhaustion

Step 2: Restart Application to Clear Stale Connections

Step 3: Increase RDS Maximum Connections Limit

Step 4: Kill Long-Running or Idle Connections

Step 5: Optimize Application Connection Pool Settings

Step 6: Scale the RDS Instance

Step 7: Implement Connection Health Checks & Timeouts

Architect's Pro Tip

Frequently Asked Questions

Will increasing `max_connections` on RDS impact performance?

How do I find the root cause of connection leaks?

Related Alibaba Cloud Guides

Fix Alibaba Cloud ACK NodePool ErrImagePull After K8s Version Upgrade

Troubleshooting Hybrid Cloud Access: CDN 403 Forbidden Errors When Serving Content from On-Premises OSS Origin

How to Fix Alibaba Cloud InvalidAccessKeyId.NotFound