ERROR

Docker Daemon: Fix Intermittent I/O Timeouts from Overloaded Storage Driver

Quick Fix Summary

TL;DR

Restart Docker daemon and throttle container I/O with `--device-write-bps`.

Intermittent I/O timeouts occur when the Docker storage driver (often overlay2) is overwhelmed by concurrent read/write operations from multiple containers, exceeding the underlying filesystem or block device capabilities.

Diagnosis & Causes

  • High concurrent I/O from many containers saturating disk I/O queues.
  • Using a suboptimal storage driver (e.g., devicemapper on loopback) on a busy host.
  • Underlying filesystem (e.g., ext4, xfs) or block storage (e.g., EBS, network-attached) performance limits.
  • Recovery Steps

    1

    Step 1: Verify System and Docker I/O Saturation

    Check system-wide and Docker-specific I/O metrics to confirm the storage driver is the bottleneck.

    bash
    # Check overall system I/O wait and disk utilization
    iostat -x 2 5
    iotop -o
    # Check Docker daemon and container-specific I/O metrics
    docker system df -v
    docker stats --no-stream
    2

    Step 2: Identify and Isolate Noisy Containers

    Find containers with excessive I/O and temporarily stop or limit them to restore stability.

    bash
    # List all running containers with their IDs
    docker ps -q
    # Inspect detailed I/O for a specific container (requires cgroup v1)
    cat /sys/fs/cgroup/blkio/docker/<CONTAINER_ID>/blkio.throttle.io_service_bytes
    # Stop the most problematic container
    docker stop <NOISY_CONTAINER_ID>
    3

    Step 3: Apply I/O Throttling to Containers

    Limit write/read rates for containers to prevent them from overwhelming the storage driver.

    bash
    # Run a new container with write rate throttling (e.g., 10 MB/s)
    docker run -it --device-write-bps /dev/sda:10mb <image>
    # Update an existing container's I/O limits (Linux host required)
    docker update --device-write-bps /dev/sda:5mb <container_name>
    4

    Step 4: Restart Docker Daemon to Clear Queues

    Gracefully restart the Docker service to reset the storage driver's internal state and I/O queues.

    bash
    # Restart Docker daemon (systemd)
    sudo systemctl restart docker
    # Verify daemon is back up and check logs for errors
    sudo systemctl status docker
    sudo journalctl -u docker -n 50 --no-pager
    5

    Step 5: Optimize Docker Daemon Storage Driver Configuration

    Tune daemon.json parameters for the overlay2 driver to better handle high I/O loads.

    bash
    # Edit Docker daemon configuration
    sudo vi /etc/docker/daemon.json
    # Add or modify storage driver options. Example configuration:
    {
      "storage-driver": "overlay2",
      "storage-opts": [
        "overlay2.override_kernel_check=true",
        "overlay2.basesize=20G"
      ]
    }
    # Apply changes and restart
    sudo systemctl restart docker
    6

    Step 6: Evaluate and Migrate Underlying Storage

    Assess the host's disk performance and consider moving Docker's data root to a high-performance volume.

    bash
    # Benchmark the current Docker data directory disk
    fio --name=randwrite --ioengine=libaio --rw=randwrite --bs=4k --numjobs=1 --size=1G --runtime=60 --time_based --direct=1 --group_reporting --directory=/var/lib/docker
    # Stop Docker, move data, and reconfigure to a new mount (e.g., /mnt/fast-disk)
    sudo systemctl stop docker
    sudo rsync -avz /var/lib/docker/ /mnt/fast-docker/
    # Edit /etc/docker/daemon.json: add "data-root": "/mnt/fast-docker"
    sudo systemctl start docker

    Architect's Pro Tip

    "This often happens during peak deployment times or when multiple data-intensive containers (like databases and log shippers) start simultaneously. Consider scheduling heavy I/O operations (e.g., batch jobs, backups) during off-peak hours and using I/O priorities (ionice) for critical containers."

    Frequently Asked Questions

    How do I know if I should switch from the overlay2 storage driver?

    Overlay2 is the recommended default. Only consider switching if you have a proven, specific performance issue with it on your kernel/filesystem combo, and you have tested alternatives like `zfs` or `btrfs` in a non-production environment. Avoid `devicemapper` in loop-lvm mode at all costs in production.

    Will restarting the Docker daemon cause container downtime?

    Yes. A daemon restart stops all running containers. Use this step during a maintenance window or ensure your containers are part of an orchestrator (like Kubernetes or Docker Swarm) that can reschedule them. For critical systems, apply I/O throttling (Step 3) first as a non-disruptive mitigation.

    Related Docker Guides