Docker Compose Failed to Start Service: Complete Troubleshooting Guide

Docker Compose Failed to Start Service: Complete Troubleshooting Guide

If you’re staring at a terminal that says “docker compose failed to start service”, you’re in good company. This is one of the most common issues developers face when working with containerized applications, and it can stem from dozens of different root causes.

The good news? Most of these failures follow predictable patterns. In this guide, I’ll walk you through a systematic debugging approach that goes from the most common culprits to edge cases that trip up even experienced engineers.


Understanding the Error Message

Before diving into solutions, it’s worth understanding what Docker Compose is actually telling you. When you run docker compose up, the orchestration engine attempts to:

  1. Pull or build required images
  2. Create containers with specified configurations
  3. Establish networks and volumes
  4. Start containers in dependency order
  5. Run health checks (if defined)

A failure at any stage produces the generic “failed to start service” message. The key to efficient debugging is extracting the actual error from Docker’s logs.

The First Command You Should Always Run

docker compose logs <service-name>

This single command resolves about 60% of debugging sessions because it reveals the specific error that caused the container to exit. But if the logs are empty or unhelpful, read on.


Step 1: Check for Port Conflicts (Most Common Cause)

Port conflicts account for a significant portion of service startup failures. When a container tries to bind to a port that’s already in use on your host machine, the service fails immediately.

How to Diagnose

# Check what's using a specific port (Linux/macOS)
sudo lsof -i :8080

# Alternative using netstat
sudo netstat -tulpn | grep :8080

# On Windows (PowerShell)
netstat -ano | findstr :8080

If you see output listing a process, that port is occupied.

The Fix

Option A: Change the host-side port in your docker-compose.yml:

services:
  webapp:
    image: nginx:latest
    ports:
      - "8081:80"  # Changed from 8080:80

Option B: Stop the conflicting process:

# Find the PID from the lsof output, then:
kill -9 <PID>

# Or on macOS, if it's a control daemon like httpd:
sudo apachectl stop

Option C: Use Docker’s built-in port detection by letting Compose assign a random host port:

services:
  webapp:
    image: nginx:latest
    ports:
      - "80"  # No host port specified — Docker picks one

Check the assigned port with:

docker compose ps

Real-World Example

I once spent forty minutes debugging a failing PostgreSQL container. The logs showed nothing useful. It turned out a previous Docker Compose run hadn’t fully torn down, and an orphaned container was still bound to port 5432. The fix:

# Remove all stopped containers and orphaned networks
docker compose down --remove-orphans
docker system prune -f
docker compose up -d

Step 2: Investigate Image Pull and Build Failures

If Docker can’t obtain the image your service depends on, the container never starts.

Diagnosing Image Pull Issues

# Try pulling the image manually
docker pull postgres:16

# Check your authentication status
docker system info | grep -i registry

# Inspect Docker's daemon logs
sudo journalctl -u docker --since "10 minutes ago"

Common error messages you might encounter:

  • manifest not found — The tag doesn’t exist
  • unauthorized — Authentication issue with private registry
  • no space left on device — Disk full
  • TLS handshake timeout — Network or DNS issue

Fixes for Image Pull Problems

Wrong or non-existent image tag:

# Wrong — tag 16.2.3 might not exist
services:
  db:
    image: postgres:16.2.3

# Right — pin to a verified existing tag
services:
  db:
    image: postgres:16.2

Always verify tags exist on Docker Hub or your private registry before referencing them.

Private registry authentication:

# Log in to your private registry
docker login registry.yourcompany.com -u youruser

# Or use a Docker config file in Compose
services:
  app:
    image: registry.yourcompany.com/myapp:latest
    # Docker uses credentials from ~/.docker/config.json

Build failures with custom Dockerfiles:

If your service uses build: instead of image:, build failures will also cause startup failures:

# Build with full output (don't use --quiet)
docker compose build --no-cache --progress plain webapp

# Check the build context size (large contexts cause timeouts)
du -sh .

Step 3: Resolve Volume Mount Issues

Volume mount problems are sneaky because they often don’t produce obvious error messages. The container might start but immediately crash because required files are missing or inaccessible.

Permission Denied Errors

This is the most common volume issue, especially on Linux:

# Check ownership of the mounted directory
ls -la ./data

# Common error in logs:
# "permission denied" or "cannot open file"

Fix for bind mounts with permission issues:

services:
  postgres:
    image: postgres:16
    volumes:
      - ./data:/var/lib/postgresql/data
    user: "1000:1000"  # Match your host UID:GID

Or adjust the host directory permissions:

# Make the directory accessible (less secure, but works for dev)
chmod -R 777 ./data

# Better approach: match the container user's UID
chown -R 999:999 ./data  # 999 is postgres's default UID

Absolute Path Requirements

Docker Compose requires absolute paths for bind mounts on some configurations. If you see an error like invalid mount path, use the full path:

services:
  app:
    volumes:
      # Wrong on some systems
      - ./config:/app/config

      # More reliable
      - /home/user/projects/myapp/config:/app/config

Or use the variable expansion approach for portability:

services:
  app:
    volumes:
      - ${PWD}/config:/app/config

Step 4: Check Resource Constraints

Containers can fail to start if the host system doesn’t have enough memory, CPU, or disk space to satisfy their resource requirements.

Diagnosing Resource Issues

# Check system resources
free -h          # Memory
df -h            # Disk space
docker system df # Docker's disk usage

# Check container resource limits
docker stats --no-stream

Disk Space Exhaustion

Docker is notorious for consuming disk space. If your service fails with “no space left on device”:

# Remove unused images, containers, and networks
docker system prune -a --volumes

# Check for dangling volumes specifically
docker volume ls -f dangling=true
docker volume prune

# Check Docker's overlay2 storage
sudo du -sh /var/lib/docker/overlay2

Memory Limits Causing OOM Kills

If your service starts but immediately dies, check the logs for Out-Of-Memory (OOM) termination:

# Check if a container was OOM-killed
docker inspect <container-id> | grep -i oomkilled

# View the exit code
docker inspect <container-id> --format='{{.State.ExitCode}}'
# Exit code 137 = killed by signal 9 (often OOM)

Fix: Increase memory limits in Compose:

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
    environment:
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
    deploy:
      resources:
        limits:
          memory: 2G
        reservations:
          memory: 1G
    # On Docker Desktop, ensure you've allocated enough RAM in settings

Step 5: Debug Dependency and Startup Order Issues

Docker Compose’s depends_on directive controls startup order, but it doesn’t wait for dependencies to be ready—only for them to be started. This causes failures when a service tries connecting to a database that hasn’t finished initializing.

The Classic Race Condition

# This configuration has a race condition
services:
  api:
    build: .
    depends_on:
      - postgres
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/db

  postgres:
    image: postgres:16

The API container starts immediately after the Postgres container is created, but Postgres takes several seconds to initialize and accept connections. The API tries to connect, fails, and exits.

Solution A: Use Health Checks

services:
  api:
    build: .
    depends_on:
      postgres:
        condition: service_healthy
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/db

  postgres:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user"]
      interval: 5s
      timeout: 5s
      retries: 5
      start_period: 10s

This tells Compose to wait until Postgres reports a healthy status before starting the API service.

Solution B: Implement Retry Logic in Your Application

# Python example with retry logic
import time
import psycopg2
from psycopg2 import OperationalError

def connect_with_retry(max_retries=30, delay=2):
    for attempt in range(max_retries):
        try:
            conn = psycopg2.connect(
                host="postgres",
                database="mydb",
                user="user",
                password="pass"
            )
            print("Connected to database!")
            return conn
        except OperationalError as e:
            print(f"Attempt {attempt + 1}/{max_retries}: Database not ready yet...")
            time.sleep(delay)
    raise Exception("Could not connect to database after retries")

connect_with_retry()
// Node.js example with exponential backoff
async function connectWithRetry(maxRetries = 30) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      await sequelize.authenticate();
      console.log('Database connection established');
      return;
    } catch (err) {
      const delay = Math.min(1000 * Math.pow(2, i), 10000);
      console.log(`Attempt ${i + 1}/${maxRetries}: Retrying in ${delay}ms...`);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
  throw new Error('Failed to connect to database');
}

Solution C: Use a Wait-For Script

#!/bin/bash
# wait-for-postgres.sh
# Usage: ./wait-for-postgres.sh postgres 5432

set -e

host="$1"
port="$2"
shift 2
cmd="$@"

until nc -z "$host" "$port"; do
  echo "Waiting for $host:$port..."
  sleep 1
done

echo "Connection available, starting application..."
exec $cmd

Integrate it into your Dockerfile:

FROM node:20-alpine
RUN apk add --no-cache netcat-openbsd
COPY wait-for-postgres.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/wait-for-postgres.sh
COPY . .
CMD ["/usr/local/bin/wait-for-postgres.sh", "postgres", "5432", "node", "server.js"]

Step 6: Examine Network Configuration Problems

Networking issues can prevent services from communicating, causing dependent services to fail.

Common Network Errors

DNS resolution failure:

# Error in container logs:
# "could not translate host name to address"
# "Name or service not known"

This happens when services are on different networks or when you reference a service by a name Compose doesn’t recognize.

Fix: Ensure services are on the same network:

services:
  web:
    build: .
    networks:
      - app-network
    depends_on:
      - api

  api:
    build: ./api
    networks:
      - app-network

networks:
  app-network:
    driver: bridge

Port binding within containers:

Remember that services communicate with each other using their container ports, not the mapped host ports:

services:
  web:
    environment:
      # Wrong — 8081 is the host port mapping
      - API_URL=http://api:8081

      # Right — 3000 is the port the app listens on inside the container
      - API_URL=http://api:3000
    depends_on:
      - api

  api:
    image: myapi:latest
    ports:
      - "8081:3000"  # Host 8081 -> Container 3000

Inspecting Network Issues

# List all networks
docker network ls

# Inspect a specific network
docker network inspect myapp_app-network

# Test connectivity from inside a container
docker exec -it <container-name> sh
ping api
nslookup api
curl http://api:3000/health

Step 7: Validate Your Docker Compose File

Sometimes the issue is a syntax error or misconfiguration in your Compose file itself.

Validate the Configuration

# Validate the file syntax
docker compose config

# This also shows the interpolated values (useful for debugging env vars)
docker compose config --quiet && echo "Valid" || echo "Invalid"

Common Configuration Mistakes

Version mismatch (if still using version field):

# This can cause issues on newer Docker versions
version: '2'  # Outdated

# Modern Docker Compose doesn't need a version field
# Just start with:
services:
  app:
    # ...

Environment variable interpolation errors:

# Error:
# "variable is not set. Defaulting to a blank string"

Create a .env file in the same directory as your Compose file:

# .env
POSTGRES_USER=myuser
POSTGRES_PASSWORD=secretpass
POSTGRES_DB=myapp

Then reference variables in your Compose file:

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: ${POSTGRES_DB}

YAML formatting issues:

# Wrong — inconsistent indentation
services:
  web:
    image: nginx
    ports:
    - "80:80"    # Misaligned

  api:
      build: .   # Over-indented
# Correct — consistent 2-space indentation
services:
  web:
    image: nginx
    ports:
      - "80:80"

  api:
    build: .

Step 8: Edge Cases and Advanced Debugging

If you’ve worked through the previous steps and your service still won’t start, it’s time to dig deeper.

Docker Daemon Issues

Sometimes the problem isn’t your configuration—it’s Docker itself:

# Check Docker daemon status
sudo systemctl status docker

# Restart the daemon
sudo systemctl restart docker

# Check daemon logs for errors
sudo journalctl -u docker.service --since "1 hour ago" | tail -50

Corrupted Docker Installation

# Check Docker version and info for anomalies
docker version
docker info

# If things are really broken, reset Docker Desktop (macOS/Windows)
# Docker Desktop > Troubleshoot icon > "Reset to factory defaults"

# On Linux, purge and reinstall
sudo apt-get purge docker-ce docker-ce-cli containerd.io
sudo apt-get install docker-ce docker-ce-cli containerd.io

Overlay2 Storage Driver Corruption

This manifests as cryptic errors during container creation:

# Error like:
# "failed to create shim task: OCI runtime create failed"
# "error creating overlay mount to /var/lib/docker/overlay2"
# Check filesystem health
sudo fsck.ext4 /dev/sda1  # Adjust for your filesystem

# Clean up Docker's storage (nuclear option — removes everything)
sudo systemctl stop docker
sudo rm -rf /var/lib/docker
sudo systemctl start docker

Warning: This removes all images, containers, and volumes. Back up anything important first.

SELinux or AppArmor Blocking Containers

On systems with SELinux enabled (RHEL, CentOS, Fedora), container operations can be blocked:

# Check SELinux status
sestatus

# Check audit log for denials
sudo ausearch -m AVC -ts recent | grep docker

# Temporarily disable SELinux for testing
sudo setenforce 0

For production, add the :z or :Z suffix to volume mounts:

services:
  web:
    volumes:
      - ./html:/usr/share/nginx/html:z  # Shared SELinux label

Entrypoint or CMD Failures

Your container might start but immediately exit because the entrypoint script fails:

# Check the exit code
docker inspect <container> --format='{{.State.ExitCode}}'

# Exit code 127: Command not found
# Exit code 126: Permission denied
# Exit code 1: Generic error (check application logs)

Debug by overriding the entrypoint:

# Start a shell instead of the normal entrypoint
docker compose run --entrypoint /bin/sh webapp

This lets you explore the container filesystem and manually run the failing command

Leave a Reply

Your email address will not be published. Required fields are marked *