Docker Compose Failed to Start Service: Complete Troubleshooting Guide
If you’re staring at a terminal that says “docker compose failed to start service”, you’re in good company. This is one of the most common issues developers face when working with containerized applications, and it can stem from dozens of different root causes.
The good news? Most of these failures follow predictable patterns. In this guide, I’ll walk you through a systematic debugging approach that goes from the most common culprits to edge cases that trip up even experienced engineers.
Understanding the Error Message
Before diving into solutions, it’s worth understanding what Docker Compose is actually telling you. When you run docker compose up, the orchestration engine attempts to:
- Pull or build required images
- Create containers with specified configurations
- Establish networks and volumes
- Start containers in dependency order
- Run health checks (if defined)
A failure at any stage produces the generic “failed to start service” message. The key to efficient debugging is extracting the actual error from Docker’s logs.
The First Command You Should Always Run
docker compose logs <service-name>
This single command resolves about 60% of debugging sessions because it reveals the specific error that caused the container to exit. But if the logs are empty or unhelpful, read on.
Step 1: Check for Port Conflicts (Most Common Cause)
Port conflicts account for a significant portion of service startup failures. When a container tries to bind to a port that’s already in use on your host machine, the service fails immediately.
How to Diagnose
# Check what's using a specific port (Linux/macOS)
sudo lsof -i :8080
# Alternative using netstat
sudo netstat -tulpn | grep :8080
# On Windows (PowerShell)
netstat -ano | findstr :8080
If you see output listing a process, that port is occupied.
The Fix
Option A: Change the host-side port in your docker-compose.yml:
services:
webapp:
image: nginx:latest
ports:
- "8081:80" # Changed from 8080:80
Option B: Stop the conflicting process:
# Find the PID from the lsof output, then:
kill -9 <PID>
# Or on macOS, if it's a control daemon like httpd:
sudo apachectl stop
Option C: Use Docker’s built-in port detection by letting Compose assign a random host port:
services:
webapp:
image: nginx:latest
ports:
- "80" # No host port specified — Docker picks one
Check the assigned port with:
docker compose ps
Real-World Example
I once spent forty minutes debugging a failing PostgreSQL container. The logs showed nothing useful. It turned out a previous Docker Compose run hadn’t fully torn down, and an orphaned container was still bound to port 5432. The fix:
# Remove all stopped containers and orphaned networks
docker compose down --remove-orphans
docker system prune -f
docker compose up -d
Step 2: Investigate Image Pull and Build Failures
If Docker can’t obtain the image your service depends on, the container never starts.
Diagnosing Image Pull Issues
# Try pulling the image manually
docker pull postgres:16
# Check your authentication status
docker system info | grep -i registry
# Inspect Docker's daemon logs
sudo journalctl -u docker --since "10 minutes ago"
Common error messages you might encounter:
manifest not found— The tag doesn’t existunauthorized— Authentication issue with private registryno space left on device— Disk fullTLS handshake timeout— Network or DNS issue
Fixes for Image Pull Problems
Wrong or non-existent image tag:
# Wrong — tag 16.2.3 might not exist
services:
db:
image: postgres:16.2.3
# Right — pin to a verified existing tag
services:
db:
image: postgres:16.2
Always verify tags exist on Docker Hub or your private registry before referencing them.
Private registry authentication:
# Log in to your private registry
docker login registry.yourcompany.com -u youruser
# Or use a Docker config file in Compose
services:
app:
image: registry.yourcompany.com/myapp:latest
# Docker uses credentials from ~/.docker/config.json
Build failures with custom Dockerfiles:
If your service uses build: instead of image:, build failures will also cause startup failures:
# Build with full output (don't use --quiet)
docker compose build --no-cache --progress plain webapp
# Check the build context size (large contexts cause timeouts)
du -sh .
Step 3: Resolve Volume Mount Issues
Volume mount problems are sneaky because they often don’t produce obvious error messages. The container might start but immediately crash because required files are missing or inaccessible.
Permission Denied Errors
This is the most common volume issue, especially on Linux:
# Check ownership of the mounted directory
ls -la ./data
# Common error in logs:
# "permission denied" or "cannot open file"
Fix for bind mounts with permission issues:
services:
postgres:
image: postgres:16
volumes:
- ./data:/var/lib/postgresql/data
user: "1000:1000" # Match your host UID:GID
Or adjust the host directory permissions:
# Make the directory accessible (less secure, but works for dev)
chmod -R 777 ./data
# Better approach: match the container user's UID
chown -R 999:999 ./data # 999 is postgres's default UID
Absolute Path Requirements
Docker Compose requires absolute paths for bind mounts on some configurations. If you see an error like invalid mount path, use the full path:
services:
app:
volumes:
# Wrong on some systems
- ./config:/app/config
# More reliable
- /home/user/projects/myapp/config:/app/config
Or use the variable expansion approach for portability:
services:
app:
volumes:
- ${PWD}/config:/app/config
Step 4: Check Resource Constraints
Containers can fail to start if the host system doesn’t have enough memory, CPU, or disk space to satisfy their resource requirements.
Diagnosing Resource Issues
# Check system resources
free -h # Memory
df -h # Disk space
docker system df # Docker's disk usage
# Check container resource limits
docker stats --no-stream
Disk Space Exhaustion
Docker is notorious for consuming disk space. If your service fails with “no space left on device”:
# Remove unused images, containers, and networks
docker system prune -a --volumes
# Check for dangling volumes specifically
docker volume ls -f dangling=true
docker volume prune
# Check Docker's overlay2 storage
sudo du -sh /var/lib/docker/overlay2
Memory Limits Causing OOM Kills
If your service starts but immediately dies, check the logs for Out-Of-Memory (OOM) termination:
# Check if a container was OOM-killed
docker inspect <container-id> | grep -i oomkilled
# View the exit code
docker inspect <container-id> --format='{{.State.ExitCode}}'
# Exit code 137 = killed by signal 9 (often OOM)
Fix: Increase memory limits in Compose:
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
environment:
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
deploy:
resources:
limits:
memory: 2G
reservations:
memory: 1G
# On Docker Desktop, ensure you've allocated enough RAM in settings
Step 5: Debug Dependency and Startup Order Issues
Docker Compose’s depends_on directive controls startup order, but it doesn’t wait for dependencies to be ready—only for them to be started. This causes failures when a service tries connecting to a database that hasn’t finished initializing.
The Classic Race Condition
# This configuration has a race condition
services:
api:
build: .
depends_on:
- postgres
environment:
- DATABASE_URL=postgresql://user:pass@postgres:5432/db
postgres:
image: postgres:16
The API container starts immediately after the Postgres container is created, but Postgres takes several seconds to initialize and accept connections. The API tries to connect, fails, and exits.
Solution A: Use Health Checks
services:
api:
build: .
depends_on:
postgres:
condition: service_healthy
environment:
- DATABASE_URL=postgresql://user:pass@postgres:5432/db
postgres:
image: postgres:16
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user"]
interval: 5s
timeout: 5s
retries: 5
start_period: 10s
This tells Compose to wait until Postgres reports a healthy status before starting the API service.
Solution B: Implement Retry Logic in Your Application
# Python example with retry logic
import time
import psycopg2
from psycopg2 import OperationalError
def connect_with_retry(max_retries=30, delay=2):
for attempt in range(max_retries):
try:
conn = psycopg2.connect(
host="postgres",
database="mydb",
user="user",
password="pass"
)
print("Connected to database!")
return conn
except OperationalError as e:
print(f"Attempt {attempt + 1}/{max_retries}: Database not ready yet...")
time.sleep(delay)
raise Exception("Could not connect to database after retries")
connect_with_retry()
// Node.js example with exponential backoff
async function connectWithRetry(maxRetries = 30) {
for (let i = 0; i < maxRetries; i++) {
try {
await sequelize.authenticate();
console.log('Database connection established');
return;
} catch (err) {
const delay = Math.min(1000 * Math.pow(2, i), 10000);
console.log(`Attempt ${i + 1}/${maxRetries}: Retrying in ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
throw new Error('Failed to connect to database');
}
Solution C: Use a Wait-For Script
#!/bin/bash
# wait-for-postgres.sh
# Usage: ./wait-for-postgres.sh postgres 5432
set -e
host="$1"
port="$2"
shift 2
cmd="$@"
until nc -z "$host" "$port"; do
echo "Waiting for $host:$port..."
sleep 1
done
echo "Connection available, starting application..."
exec $cmd
Integrate it into your Dockerfile:
FROM node:20-alpine
RUN apk add --no-cache netcat-openbsd
COPY wait-for-postgres.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/wait-for-postgres.sh
COPY . .
CMD ["/usr/local/bin/wait-for-postgres.sh", "postgres", "5432", "node", "server.js"]
Step 6: Examine Network Configuration Problems
Networking issues can prevent services from communicating, causing dependent services to fail.
Common Network Errors
DNS resolution failure:
# Error in container logs:
# "could not translate host name to address"
# "Name or service not known"
This happens when services are on different networks or when you reference a service by a name Compose doesn’t recognize.
Fix: Ensure services are on the same network:
services:
web:
build: .
networks:
- app-network
depends_on:
- api
api:
build: ./api
networks:
- app-network
networks:
app-network:
driver: bridge
Port binding within containers:
Remember that services communicate with each other using their container ports, not the mapped host ports:
services:
web:
environment:
# Wrong — 8081 is the host port mapping
- API_URL=http://api:8081
# Right — 3000 is the port the app listens on inside the container
- API_URL=http://api:3000
depends_on:
- api
api:
image: myapi:latest
ports:
- "8081:3000" # Host 8081 -> Container 3000
Inspecting Network Issues
# List all networks
docker network ls
# Inspect a specific network
docker network inspect myapp_app-network
# Test connectivity from inside a container
docker exec -it <container-name> sh
ping api
nslookup api
curl http://api:3000/health
Step 7: Validate Your Docker Compose File
Sometimes the issue is a syntax error or misconfiguration in your Compose file itself.
Validate the Configuration
# Validate the file syntax
docker compose config
# This also shows the interpolated values (useful for debugging env vars)
docker compose config --quiet && echo "Valid" || echo "Invalid"
Common Configuration Mistakes
Version mismatch (if still using version field):
# This can cause issues on newer Docker versions
version: '2' # Outdated
# Modern Docker Compose doesn't need a version field
# Just start with:
services:
app:
# ...
Environment variable interpolation errors:
# Error:
# "variable is not set. Defaulting to a blank string"
Create a .env file in the same directory as your Compose file:
# .env
POSTGRES_USER=myuser
POSTGRES_PASSWORD=secretpass
POSTGRES_DB=myapp
Then reference variables in your Compose file:
services:
postgres:
image: postgres:16
environment:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: ${POSTGRES_DB}
YAML formatting issues:
# Wrong — inconsistent indentation
services:
web:
image: nginx
ports:
- "80:80" # Misaligned
api:
build: . # Over-indented
# Correct — consistent 2-space indentation
services:
web:
image: nginx
ports:
- "80:80"
api:
build: .
Step 8: Edge Cases and Advanced Debugging
If you’ve worked through the previous steps and your service still won’t start, it’s time to dig deeper.
Docker Daemon Issues
Sometimes the problem isn’t your configuration—it’s Docker itself:
# Check Docker daemon status
sudo systemctl status docker
# Restart the daemon
sudo systemctl restart docker
# Check daemon logs for errors
sudo journalctl -u docker.service --since "1 hour ago" | tail -50
Corrupted Docker Installation
# Check Docker version and info for anomalies
docker version
docker info
# If things are really broken, reset Docker Desktop (macOS/Windows)
# Docker Desktop > Troubleshoot icon > "Reset to factory defaults"
# On Linux, purge and reinstall
sudo apt-get purge docker-ce docker-ce-cli containerd.io
sudo apt-get install docker-ce docker-ce-cli containerd.io
Overlay2 Storage Driver Corruption
This manifests as cryptic errors during container creation:
# Error like:
# "failed to create shim task: OCI runtime create failed"
# "error creating overlay mount to /var/lib/docker/overlay2"
# Check filesystem health
sudo fsck.ext4 /dev/sda1 # Adjust for your filesystem
# Clean up Docker's storage (nuclear option — removes everything)
sudo systemctl stop docker
sudo rm -rf /var/lib/docker
sudo systemctl start docker
Warning: This removes all images, containers, and volumes. Back up anything important first.
SELinux or AppArmor Blocking Containers
On systems with SELinux enabled (RHEL, CentOS, Fedora), container operations can be blocked:
# Check SELinux status
sestatus
# Check audit log for denials
sudo ausearch -m AVC -ts recent | grep docker
# Temporarily disable SELinux for testing
sudo setenforce 0
For production, add the :z or :Z suffix to volume mounts:
services:
web:
volumes:
- ./html:/usr/share/nginx/html:z # Shared SELinux label
Entrypoint or CMD Failures
Your container might start but immediately exit because the entrypoint script fails:
# Check the exit code
docker inspect <container> --format='{{.State.ExitCode}}'
# Exit code 127: Command not found
# Exit code 126: Permission denied
# Exit code 1: Generic error (check application logs)
Debug by overriding the entrypoint:
# Start a shell instead of the normal entrypoint
docker compose run --entrypoint /bin/sh webapp
This lets you explore the container filesystem and manually run the failing command