How to Fix Docker Out of Disk Space: A Complete Troubleshooting Guide

How to Fix Docker Out of Disk Space: A Complete Troubleshooting Guide

You’re mid-deploy, your build is humming along, and then Docker throws something like this at you:

=> ERROR [internal] load metadata for docker.io/library/node:22-alpine     1.2s
------
> [internal] load metadata for docker.io/library/node:22-alpine:
------
Error response from daemon: write /var/lib/docker/tmp/...: no space left on device

Or maybe it’s the classic failed to register layer: Error processing tar file(exit status 1): write ...: no space left on device. Either way, your Docker daemon is choking on a full disk, and df -h probably confirms it: your root partition or /var/lib/docker is at 100%.

I’ve hit this exact error more times than I’d like to admit — on CI runners, production servers, and my own laptop after a week of intense development. The good news is that Docker gives you excellent tooling to reclaim space. The bad news is that the “obvious” fix (docker system prune -a) sometimes isn’t enough.

This guide walks you through how to fix docker out of disk space, starting from a 30-second quick fix and moving into the edge cases that catch senior developers off guard.


Quick Fix: The 30-Second Solution

If you just need Docker working again right now and you don’t care about losing caches, run:

docker system df
docker system prune -a --volumes

The first command shows you what’s eating space. The second nukes everything not currently in use: stopped containers, dangling and unused images, build cache, and unused volumes.

Stop here and read the rest if any of these are true:

  • You have local volumes containing databases you haven’t backed up.
  • You’re on a team and aren’t sure what’s safe to remove.
  • docker system prune -a --volumes didn’t actually free enough space (this happens more often than you’d think).

Now let’s actually understand what’s happening.


Root Cause Analysis: Where Does All the Disk Go?

Docker doesn’t lose space randomly. It accumulates in a small number of well-defined buckets. Running docker system df tells you exactly which:

$ docker system df
TYPE            TOTAL   ACTIVE  SIZE      RECLAIMABLE
Images          142     12      48.2GB    46.8GB (97%)
Containers      28      6       1.2GB     800MB (66%)
Local Volumes   34      18      22.4GB    11.1GB (49%)
Build Cache     832     0       18.7GB    18.7GB

The major culprits, in order of how often I see them in the wild:

1. Unused Images (Most Common)

Every docker pull, every docker build, every FROM line in a Dockerfile creates layers. Over weeks of development you’ll accumulate dozens of base images, intermediate layers, and tags you’ve forgotten about. Multi-arch pulls (linux/arm64 + linux/amd64) double the storage.

2. Dangling Volumes

When a container is removed without -v, its volume lives on forever. Run a few dozen docker-compose up / docker-compose down cycles and you’ll have orphaned database volumes sitting there silently consuming gigabytes.

3. BuildKit Cache

BuildKit is fantastic for fast builds, but it caches every layer of every build. Long-lived CI machines are especially vulnerable — I’ve seen a single GitLab runner accumulate 60GB of BuildKit cache over three months.

4. Container Logs (The Sneaky One)

This is the one that catches everyone. By default, Docker’s json-file log driver does not rotate logs. A chatty container can fill your disk with a single multi-gigabyte *-json.log file. This won’t show up in docker system df — it’s hidden inside /var/lib/docker/containers/*/.

5. Stopped Containers

Each stopped container holds its writable layer. Usually small, but add up hundreds of them and it matters.


Step-by-Step: From Most Common to Edge Cases

Step 1: Diagnose Precisely

Before deleting anything, understand what’s consuming space. Run these three commands:

# High-level breakdown
docker system df

# Verbose — shows per-image and per-volume sizes
docker system df -v

# Check actual disk usage at the OS level
sudo du -sh /var/lib/docker/*

The verbose output is gold: it tells you exactly which image IDs and volume names are the biggest offenders. I keep a shell alias for this:

alias docker-fat='docker system df -v | head -50'

Step 2: Remove Dangling and Unused Images

Dangling images are layers with no tag — typically leftovers from failed builds or overwritten tags:

# Only dangling (untagged) images — very safe
docker image prune -f

# All images not currently used by a running container
docker image prune -a -f

If you want surgical control, list and remove specific images:

docker images --format 'table {{.Repository}}\t{{.Tag}}\t{{.Size}}\t{{.ID}}' | sort -k3 -h

# Remove by ID
docker rmi <image-id>

A handy trick for nuking everything matching a pattern:

docker rmi $(docker images --filter "reference=myorg/*" -q)

Step 3: Clear Build Cache (BuildKit)

This is the second-most-overlooked fix. Since Docker Engine 23.0, BuildKit is the default builder, and its cache can balloon.

# Show what's cached
docker buildx du

# Remove all build cache
docker builder prune -a -f

# Keep cache from the last 24h, remove older
docker builder prune -a -f --keep-storage=5gb

# Or filter by age
docker builder prune -a -f --filter "until=48h"

In CI environments, I add docker builder prune -a -f to a weekly cron. It’s the single biggest disk-saver for build-heavy machines.


Step 4: Hunt Down Orphaned Volumes (Carefully!)

Volumes are where production data lives. Never run docker volume prune without checking what’s there first.

# List all volumes with their mount path
docker volume ls

# Inspect a specific volume
docker volume inspect <volume-name>

# Find volumes not used by any container
docker volume ls -f dangling=true

Before removing, I recommend backing up anything you might need:

# Backup a volume to a tarball
docker run --rm -v <volume-name>:/data -v $(pwd):/backup alpine \
  tar czf /backup/volume-$(date +%F).tar.gz -C /data .

# Now safe to remove
docker volume rm <volume-name>

# Or remove all dangling volumes at once
docker volume prune -f

The number of times I’ve seen someone nuke a local Postgres volume because they ran docker system prune --volumes without thinking… it’s a lot. Back up first.


Step 5: The Hidden Killer: Container Logs

If docker system df shows a healthy Docker but your disk is still full, this is almost certainly the cause. Check container log sizes:

# Find the biggest log files
sudo find /var/lib/docker/containers -name "*-json.log" -exec du -sh {} + | sort -h

If you find a 12GB JSON log file, you’ve found your culprit.

The temporary fix:

# Truncate the offending log (container can stay running)
sudo truncate -s 0 /var/lib/docker/containers/<container-id>/<container-id>-json.log

The permanent fix: configure log rotation in /etc/docker/daemon.json:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Then restart Docker:

sudo systemctl restart docker

Important: this only applies to newly created containers. Existing containers keep their old config until recreated. For long-running services, schedule a rolling redeploy.

A more aggressive setup for high-churn environments:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "5m",
    "max-file": "5",
    "labels": "service,env"
  }
}

Step 6: Move /var/lib/docker to a Larger Disk

Sometimes your root partition is simply too small. This is common on cloud VMs with a 20GB root disk. The cleanest solution is to relocate Docker’s data directory.

# Stop Docker
sudo systemctl stop docker

# Create the new location (e.g., a mounted data disk)
sudo mkdir -p /mnt/data/docker

# Move the data — use rsync so you can resume if interrupted
sudo rsync -aP /var/lib/docker/ /mnt/data/docker/

# Configure Docker to use the new location
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<EOF
{
  "data-root": "/mnt/data/docker"
}
EOF

# Start Docker and verify
sudo systemctl start docker
docker info | grep "Docker Root Dir"

# Once confirmed working, remove the old directory
sudo rm -rf /var/lib/docker

I do this on every new server I provision. It saves a world of pain later.


Step 7: Platform-Specific Cleanup

macOS: Docker Desktop Disk Image

Docker Desktop on Mac stores everything in a sparse disk image that grows but doesn’t always shrink. In recent Docker Desktop versions:

  1. Open Docker Desktop → SettingsResourcesAdvanced
  2. Use the Disk image size slider
  3. Click Clean / Purge data or use TroubleshootClean / Purge data

From the CLI:

# Built-in reclaim tool (Docker Desktop 4.30+)
docker run --rm -it --privileged --pid=host docker/desktop-reclaim-space

Windows: WSL2 Disk Shrink

WSL2’s virtual disk (ext4.vhdx) doesn’t release space back to Windows automatically. To compact it:

# Shut down WSL
wsl --shutdown

# Enable sparse VHD (WSL 2.0+, Windows 11)
wsl --manage docker-desktop --set-sparse true

# Or use diskpart for older setups
diskpart
# Inside diskpart:
# select vdisk file="C:\Users\<you>\AppData\Local\Docker\wsl\data\ext4.vhdx"
# attach vdisk readonly
# compact vdisk
# detach vdisk
# exit

After this, Docker Desktop will use the compacted VHD on next start.


Step 8: Edge Case — Overlay2 Leaks

Very rarely, the overlay2 storage driver leaks layers after crashes or OOM kills. If docker system df shows low usage but /var/lib/docker/overlay2 is huge, you might have leaked layers.

Diagnose with:

# Total overlay2 size
sudo du -sh /var/lib/docker/overlay2

# Find orphaned directories (not referenced by any image/container)
sudo docker image ls --format '{{.ID}}' | xargs -I{} docker inspect {} --format '{{.GraphDriver.Data.UpperDir}}'

The clean fix is brutal but effective:

sudo systemctl stop docker
sudo systemctl stop docker.socket containerd
sudo mv /var/lib/docker /var/lib/docker.bak
sudo systemctl start docker

You’ll start with a clean slate — re-pull your images. Only do this when nothing else works.


Prevention: Stop It Happening Again

A few habits will keep Docker from eating your disk:

1. Enable Log Rotation From Day One

This is the single most important config. Put it in your Ansible/Terraform/Puppet setup so every new Docker host has it.

2. Schedule Weekly Prunes in CI

For build servers, add a cron job:

# /etc/cron.weekly/docker-prune
#!/bin/bash
set -e
docker system prune -af --volumes --filter "until=168h"
docker builder prune -af --filter "until=168h"

Make it executable:

sudo chmod +x /etc/cron.weekly/docker-prune

The until=168h filter protects anything used in the last week.

3. Use Multi-Stage Builds

Multi-stage builds dramatically reduce final image size and, by extension, what gets cached:

# Build stage
FROM golang:1.23-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -o /app/server ./cmd/server

# Final stage — tiny image
FROM alpine:3.20
COPY --from=builder /app/server /usr/local/bin/server
ENTRYPOINT ["/usr/local/bin/server"]

The final image is ~15MB instead of ~900MB. Multiply that across a dozen services and the savings add up fast.

4. Use .dockerignore

Stop sending your .git, node_modules, target/, and other bloat to the daemon:

.git
node_modules
target
*.log
.env
dist
build
.DS_Store

Every excluded file is a layer that doesn’t get built and cached.

5. Tag and Prune Strategically

Don’t rely on latest for everything. Tag with build numbers or SHAs, and set up

Leave a Reply

Your email address will not be published. Required fields are marked *