Docker Daemon: Fix Intermittent I/O Timeouts from Overloaded Storage Driver
Quick Fix Summary
TL;DRRestart Docker daemon and throttle container I/O with `--device-write-bps`.
Intermittent I/O timeouts occur when the Docker storage driver (often overlay2) is overwhelmed by concurrent read/write operations from multiple containers, exceeding the underlying filesystem or block device capabilities.
Diagnosis & Causes
Recovery Steps
Step 1: Verify System and Docker I/O Saturation
Check system-wide and Docker-specific I/O metrics to confirm the storage driver is the bottleneck.
# Check overall system I/O wait and disk utilization
iostat -x 2 5
iotop -o
# Check Docker daemon and container-specific I/O metrics
docker system df -v
docker stats --no-stream Step 2: Identify and Isolate Noisy Containers
Find containers with excessive I/O and temporarily stop or limit them to restore stability.
# List all running containers with their IDs
docker ps -q
# Inspect detailed I/O for a specific container (requires cgroup v1)
cat /sys/fs/cgroup/blkio/docker/<CONTAINER_ID>/blkio.throttle.io_service_bytes
# Stop the most problematic container
docker stop <NOISY_CONTAINER_ID> Step 3: Apply I/O Throttling to Containers
Limit write/read rates for containers to prevent them from overwhelming the storage driver.
# Run a new container with write rate throttling (e.g., 10 MB/s)
docker run -it --device-write-bps /dev/sda:10mb <image>
# Update an existing container's I/O limits (Linux host required)
docker update --device-write-bps /dev/sda:5mb <container_name> Step 4: Restart Docker Daemon to Clear Queues
Gracefully restart the Docker service to reset the storage driver's internal state and I/O queues.
# Restart Docker daemon (systemd)
sudo systemctl restart docker
# Verify daemon is back up and check logs for errors
sudo systemctl status docker
sudo journalctl -u docker -n 50 --no-pager Step 5: Optimize Docker Daemon Storage Driver Configuration
Tune daemon.json parameters for the overlay2 driver to better handle high I/O loads.
# Edit Docker daemon configuration
sudo vi /etc/docker/daemon.json
# Add or modify storage driver options. Example configuration:
{
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true",
"overlay2.basesize=20G"
]
}
# Apply changes and restart
sudo systemctl restart docker Step 6: Evaluate and Migrate Underlying Storage
Assess the host's disk performance and consider moving Docker's data root to a high-performance volume.
# Benchmark the current Docker data directory disk
fio --name=randwrite --ioengine=libaio --rw=randwrite --bs=4k --numjobs=1 --size=1G --runtime=60 --time_based --direct=1 --group_reporting --directory=/var/lib/docker
# Stop Docker, move data, and reconfigure to a new mount (e.g., /mnt/fast-disk)
sudo systemctl stop docker
sudo rsync -avz /var/lib/docker/ /mnt/fast-docker/
# Edit /etc/docker/daemon.json: add "data-root": "/mnt/fast-docker"
sudo systemctl start docker Architect's Pro Tip
"This often happens during peak deployment times or when multiple data-intensive containers (like databases and log shippers) start simultaneously. Consider scheduling heavy I/O operations (e.g., batch jobs, backups) during off-peak hours and using I/O priorities (ionice) for critical containers."
Frequently Asked Questions
How do I know if I should switch from the overlay2 storage driver?
Overlay2 is the recommended default. Only consider switching if you have a proven, specific performance issue with it on your kernel/filesystem combo, and you have tested alternatives like `zfs` or `btrfs` in a non-production environment. Avoid `devicemapper` in loop-lvm mode at all costs in production.
Will restarting the Docker daemon cause container downtime?
Yes. A daemon restart stops all running containers. Use this step during a maintenance window or ensure your containers are part of an orchestrator (like Kubernetes or Docker Swarm) that can reschedule them. For critical systems, apply I/O throttling (Step 3) first as a non-disruptive mitigation.