GitHub Actions Workflow Failed: How to Fix It (Complete Troubleshooting Guide)

GitHub Actions Workflow Failed: How to Fix It (Complete Troubleshooting Guide)

You pushed your code. The CI pipeline ran. Then you saw that dreaded red X next to your commit. We’ve all been there — staring at a cryptic log, wondering what went wrong between your local machine and GitHub’s servers.

This guide walks you through every common (and some uncommon) reason your GitHub Actions workflow fails, with real error messages, root cause analysis, and copy-paste-ready fixes. Whether you’re dealing with a flaky test suite, a misconfigured secret, or a deprecated action throwing warnings at 2 AM, you’ll find the solution here.


Why Your GitHub Actions Workflow Fails: The Big Picture

Before diving into specific fixes, it helps to understand that most workflow failures fall into one of these categories:

  1. Configuration errors — YAML syntax problems, invalid triggers, or misconfigured jobs
  2. Environment mismatches — different Node, Python, Java, or OS versions between local and CI
  3. Authentication and permissions — missing secrets, insufficient token scopes, or expired credentials
  4. Dependency issues — package resolution failures, lock file conflicts, or registry authentication
  5. Deprecated actions or commands — upstream changes breaking your workflow
  6. Resource and infrastructure limits — timeouts, runner capacity, or API rate limits

Let’s work through each, starting with the most frequent culprits.


Step 1: Read the Actual Error (Not Just the Red X)

This sounds obvious, but it’s the most skipped step. GitHub collapses logs by default, which hides the real error hundreds of lines deep.

How to Find the Real Error

  1. Click the failed workflow run
  2. Expand the failed step (not the whole job)
  3. Scroll to the first line containing Error, error:, FATAL, or failed
  4. Read upward from that line — context usually appears before the error

A common mistake is reading only the last few lines. The actual root cause is often 20-50 lines above the final exit code. For example, a Process completed with exit code 1 message tells you nothing — the real error is whatever command triggered that exit.


Step 2: Check for YAML Syntax Errors

YAML is notoriously sensitive to indentation and quoting. A single misplaced space can silently break your workflow or cause unexpected behavior.

Common YAML Mistake: Inconsistent Indentation

# BROKEN — mixing spaces and assumptions
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build
        run: npm run build
      - name: Test
      run: npm test  # ← This is NOT indented under the step
# FIXED — proper indentation
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build
        run: npm run build
      - name: Test
        run: npm test

Validate Locally with actionlint

Install actionlint to catch these before pushing:

# Install actionlint
brew install actionlint

# Or via Go
go install github.com/rhysd/actionlint/cmd/actionlint@latest

# Validate your workflow file
actionlint .github/workflows/ci.yml

actionlint catches syntax errors, deprecated action versions, invalid expressions, and shell script issues inside run blocks. I run it as a git pre-commit hook on every workflow file change — it’s saved me from countless “fix CI” commits.

The Colons-in-Values Trap

# BROKEN — unquoted colon breaks parsing
env:
  DATABASE_URL: postgres://user:pass@host:5432/db
# FIXED — quoted properly
env:
  DATABASE_URL: "postgres://user:pass@host:5432/db"

YAML interprets unquoted colons as key-value separators. Always quote values containing colons, especially URLs and connection strings.


Step 3: Verify Your Secrets and Environment Variables

Missing or misnamed secrets are the single most common cause of workflow failures I see in production repositories.

The Classic Mistake: Secret Scope

GitHub secrets are scoped to repositories or environments, not globally. A secret created in your production environment won’t be available in a job that doesn’t reference that environment.

# BROKEN — secret is in 'production' environment, but job doesn't use it
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Deploy
        run: |
          echo "${{ secrets.DEPLOY_KEY }}"  # Empty!
# FIXED — job references the environment
jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production  # ← This unlocks the secret
    steps:
      - name: Deploy
        run: |
          echo "${{ secrets.DEPLOY_KEY }}"

Debugging Secret Availability Safely

Never echo secrets directly. Instead, check if they exist:

steps:
  - name: Check secrets
    env:
      DEPLOY_KEY: ${{ secrets.DEPLOY_KEY }}
    run: |
      if [ -z "$DEPLOY_KEY" ]; then
        echo "::error::DEPLOY_KEY secret is missing or empty"
        exit 1
      else
        echo "DEPLOY_KEY is set (length: ${#DEPLOY_KEY})"
      fi

Enable Secret Debug Logging (Temporarily)

If you’re stuck, GitHub supports a special debug mode — but only enable this in private repos and disable it immediately after:

  1. Go to your repository Settings > Secrets and variables > Actions
  2. Add a new secret named ACTIONS_STEP_DEBUG with value true
  3. Re-run your workflow — you’ll get extended logging

Delete this secret when you’re done. It significantly increases log volume and can expose sensitive data in shared environments.


Step 4: Check the GITHUB_TOKEN Permissions

Since 2023, GitHub enforces least-privilege defaults for the automatically-generated GITHUB_TOKEN. If your workflow worked before and suddenly fails with a permissions error on git push or package publishing, this is likely why.

The Push-Back-to-Repository Failure

# This fails with "fatal: unable to access ... 403"
steps:
  - name: Push changes
    run: |
      git config user.name "github-actions[bot]"
      git config user.email "github-actions[bot]@users.noreply.github.com"
      git commit -m "Auto-format code"
      git push

The default GITHUB_TOKEN doesn’t have write permissions unless you explicitly grant them:

# FIXED — grant contents:write permission
permissions:
  contents: write

jobs:
  format:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
      - name: Push changes
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git commit -m "Auto-format code"
          git push

You can set permissions at the workflow level (applies to all jobs) or at the job level. Job-level settings override workflow-level ones.


Step 5: Resolve Dependency and Build Failures

“Module Not Found” in CI but Not Locally

This usually means your lock file is out of sync with package.json, or you’re ignoring files in .gitignore that CI needs.

# Common error message
Error: Cannot find module 'some-package'
Require stack:
- /home/runner/work/repo/repo/index.js

Root cause: You installed a package locally but forgot to commit the updated package-lock.json, or you used npm install instead of npm ci.

# CORRECT — use ci for reproducible installs
steps:
  - uses: actions/checkout@v4
  - uses: actions/setup-node@v4
    with:
      node-version: '20'
      cache: 'npm'
  - run: npm ci        # ← Strict install from lockfile
  - run: npm run build

npm ci deletes node_modules and installs exactly what’s in the lock file. If the lock file doesn’t match package.json, it fails loudly — which is what you want.

Python: Poetry Lock Mismatches

steps:
  - uses: actions/checkout@v4
  - uses: actions/setup-python@v5
    with:
      python-version: '3.12'
  - name: Install Poetry
    run: pip install poetry==1.8.4
  - name: Install dependencies
    run: poetry install --no-interaction --no-root

If this fails with a lock file error, your poetry.lock is stale. Run poetry lock --no-update locally, commit the refreshed lock file, and push.


Step 6: Handle Deprecated Actions and Commands

GitHub deprecates actions and commands on a rolling basis. If a previously working workflow suddenly starts failing after months of stability, check for deprecation notices.

The set-output Deprecation

Older workflows used this pattern:

# DEPRECATED — no longer works
- name: Set output
  id: vars
  run: echo "::set-output name=version::1.0.0"
# CURRENT — use GITHUB_OUTPUT
- name: Set output
  id: vars
  run: echo "version=1.0.0" >> $GITHUB_OUTPUT

Node 12, Node 16, and Node 20 Action Runtimes

GitHub has progressively retired Node.js runtimes for actions:

  • Node 12 — deprecated January 2023
  • Node 16 — deprecated September 2023
  • Node 20 — current as of 2026

If you maintain a custom action, update its action.yml:

# Update this line
runs:
  using: 'node20'
  main: 'dist/index.js'

And in your workflow, pin to current major versions:

# GOOD — explicit versions
steps:
  - uses: actions/checkout@v4
  - uses: actions/setup-node@v4
  - uses: actions/cache@v4

Avoid @main or @latest tags. They break without warning when maintainers push changes.


Step 7: Fix Caching Problems

Caching speeds up builds but introduces a class of failures that are hard to debug because they’re intermittent.

Corrupted Cache Causes Build Failures

# The cache restore succeeds, but the build fails afterward
- uses: actions/cache@v4
  with:
    path: node_modules
    key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}

Problem: If node_modules gets partially written or a dependency ships a broken release, the cache stores that broken state and serves it to every subsequent run.

Solution: Include the lock file hash and a version prefix you can bump to invalidate:

- uses: actions/cache@v4
  with:
    path: node_modules
    key: v2-${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}
    restore-keys: |
      v2-${{ runner.os }}-node-

When you suspect cache corruption, bump v2 to v3 to force a fresh cache build.

Cache Restore Keys Causing Wrong Dependencies

The restore-keys fallback can load a cache from a different branch or commit. If your tests pass locally but fail in CI with weird dependency behavior, disable the cache temporarily:

# Temporarily skip caching to isolate the issue
# - uses: actions/cache@v4
#   with:
#     path: node_modules
#     key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}

Step 8: Debug Working Directory and Path Issues

Your code runs in /home/runner/work/{repo-name}/{repo-name} on Linux runners, not in your repo root. Commands that work locally can fail because of path differences.

The “File Not Found” After Checkout

# BROKEN — script doesn't exist at this path
steps:
  - uses: actions/checkout@v4
  - run: ./scripts/deploy.sh

If scripts/deploy.sh isn’t executable or isn’t committed (check .gitignore), this fails. Fix it:

steps:
  - uses: actions/checkout@v4
  - run: |
      chmod +x scripts/deploy.sh
      ./scripts/deploy.sh

Docker Build Context Issues

# This fails if Dockerfile references paths relative to a subdirectory
- name: Build Docker image
  run: docker build -t app .

If your Dockerfile is in a subdirectory:

- name: Build Docker image
  run: docker build -t app -f docker/Dockerfile .

The build context (.) is always relative to your current working directory, which is the repo root unless you change it.


Step 9: Address Concurrency and Race Conditions

If your workflow sometimes passes and sometimes fails with no code changes, you likely have a race condition.

Concurrent Deployments Overwriting Each Other

# PROBLEM — multiple pushes trigger overlapping deployments
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - run: ./deploy.sh
# FIXED — cancel previous runs, queue new ones
concurrency:
  group: deploy-${{ github.ref }}
  cancel-in-progress: false  # Queue instead of cancel for deployments

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - run: ./deploy.sh

Use cancel-in-progress: true for CI builds (safe to cancel stale runs). Use false for deployments (you don’t want to cancel a half-finished migration).


Step 10: Handle Runner and Resource Limits

Job Timeouts

The default timeout is 360 minutes (6 hours). If your job exceeds this:

jobs:
  heavy-build:
    runs-on: ubuntu-latest
    timeout-minutes: 15  # ← Set a realistic limit
    steps:
      - run: ./long-running-script.sh

Setting a timeout prevents runaway jobs from burning through your action minutes quota.

API Rate Limiting

If your workflow makes many GitHub API calls (via gh CLI or octokit), you can hit rate limits:

# Error message
gh: API rate limit exceeded for installation ID 12345678.

Use the built-in GITHUB_TOKEN for API calls — it has a higher rate limit than unauthenticated requests:

steps:
  - uses: actions/github-script@v7
    with:
      script: |
        const repos = await github.rest.repos.listForOrg({
          org: context.repo.owner,
          per_page: 100
        });
        console.log(repos.data.length);

Disk Space on GitHub-Hosted Runners

The standard ubuntu-latest runner has about 14 GB of free disk space. Large Docker images or monorepo builds can exhaust this:

# Error
No space left on device

Free up space before building:

steps:
  - name: Free disk space
    run: |
      sudo rm -rf /usr/share/dotnet
      sudo rm -rf /opt/ghc
      sudo rm -rf "/usr/local/share/boost"
      sudo rm -rf "$AGENT_TOOLSDIRECTORY"
      df -h

Or use a larger runner (ubuntu-latest-4-cores, ubuntu-latest-16-cores) if your workflow genuinely needs more resources.


Step 11: Debug with Re-run and SSH Access

Re-run with Debug Logging

From the workflow run page, click Re-run all jobs > Enable debug logging. This requires the ACTIONS_RUNNER_DEBUG secret set to true in your repo settings.

SSH Into a Failed Runner

For complex failures, you can pause the runner and SSH in:

“`yaml
steps:
– uses: actions/checkout@v4
– name: Setup SSH debugging
uses: mxschmitt/action-tmate@v3
with

Leave a Reply

Your email address will not be published. Required fields are marked *