GitHub Actions Workflow Failed: How to Fix It (Complete Troubleshooting Guide)
You pushed your code. The CI pipeline ran. Then you saw that dreaded red X next to your commit. We’ve all been there — staring at a cryptic log, wondering what went wrong between your local machine and GitHub’s servers.
This guide walks you through every common (and some uncommon) reason your GitHub Actions workflow fails, with real error messages, root cause analysis, and copy-paste-ready fixes. Whether you’re dealing with a flaky test suite, a misconfigured secret, or a deprecated action throwing warnings at 2 AM, you’ll find the solution here.
Why Your GitHub Actions Workflow Fails: The Big Picture
Before diving into specific fixes, it helps to understand that most workflow failures fall into one of these categories:
- Configuration errors — YAML syntax problems, invalid triggers, or misconfigured jobs
- Environment mismatches — different Node, Python, Java, or OS versions between local and CI
- Authentication and permissions — missing secrets, insufficient token scopes, or expired credentials
- Dependency issues — package resolution failures, lock file conflicts, or registry authentication
- Deprecated actions or commands — upstream changes breaking your workflow
- Resource and infrastructure limits — timeouts, runner capacity, or API rate limits
Let’s work through each, starting with the most frequent culprits.
Step 1: Read the Actual Error (Not Just the Red X)
This sounds obvious, but it’s the most skipped step. GitHub collapses logs by default, which hides the real error hundreds of lines deep.
How to Find the Real Error
- Click the failed workflow run
- Expand the failed step (not the whole job)
- Scroll to the first line containing
Error,error:,FATAL, orfailed - Read upward from that line — context usually appears before the error
A common mistake is reading only the last few lines. The actual root cause is often 20-50 lines above the final exit code. For example, a Process completed with exit code 1 message tells you nothing — the real error is whatever command triggered that exit.
Step 2: Check for YAML Syntax Errors
YAML is notoriously sensitive to indentation and quoting. A single misplaced space can silently break your workflow or cause unexpected behavior.
Common YAML Mistake: Inconsistent Indentation
# BROKEN — mixing spaces and assumptions
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build
run: npm run build
- name: Test
run: npm test # ← This is NOT indented under the step
# FIXED — proper indentation
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build
run: npm run build
- name: Test
run: npm test
Validate Locally with actionlint
Install actionlint to catch these before pushing:
# Install actionlint
brew install actionlint
# Or via Go
go install github.com/rhysd/actionlint/cmd/actionlint@latest
# Validate your workflow file
actionlint .github/workflows/ci.yml
actionlint catches syntax errors, deprecated action versions, invalid expressions, and shell script issues inside run blocks. I run it as a git pre-commit hook on every workflow file change — it’s saved me from countless “fix CI” commits.
The Colons-in-Values Trap
# BROKEN — unquoted colon breaks parsing
env:
DATABASE_URL: postgres://user:pass@host:5432/db
# FIXED — quoted properly
env:
DATABASE_URL: "postgres://user:pass@host:5432/db"
YAML interprets unquoted colons as key-value separators. Always quote values containing colons, especially URLs and connection strings.
Step 3: Verify Your Secrets and Environment Variables
Missing or misnamed secrets are the single most common cause of workflow failures I see in production repositories.
The Classic Mistake: Secret Scope
GitHub secrets are scoped to repositories or environments, not globally. A secret created in your production environment won’t be available in a job that doesn’t reference that environment.
# BROKEN — secret is in 'production' environment, but job doesn't use it
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Deploy
run: |
echo "${{ secrets.DEPLOY_KEY }}" # Empty!
# FIXED — job references the environment
jobs:
deploy:
runs-on: ubuntu-latest
environment: production # ← This unlocks the secret
steps:
- name: Deploy
run: |
echo "${{ secrets.DEPLOY_KEY }}"
Debugging Secret Availability Safely
Never echo secrets directly. Instead, check if they exist:
steps:
- name: Check secrets
env:
DEPLOY_KEY: ${{ secrets.DEPLOY_KEY }}
run: |
if [ -z "$DEPLOY_KEY" ]; then
echo "::error::DEPLOY_KEY secret is missing or empty"
exit 1
else
echo "DEPLOY_KEY is set (length: ${#DEPLOY_KEY})"
fi
Enable Secret Debug Logging (Temporarily)
If you’re stuck, GitHub supports a special debug mode — but only enable this in private repos and disable it immediately after:
- Go to your repository Settings > Secrets and variables > Actions
- Add a new secret named
ACTIONS_STEP_DEBUGwith valuetrue - Re-run your workflow — you’ll get extended logging
Delete this secret when you’re done. It significantly increases log volume and can expose sensitive data in shared environments.
Step 4: Check the GITHUB_TOKEN Permissions
Since 2023, GitHub enforces least-privilege defaults for the automatically-generated GITHUB_TOKEN. If your workflow worked before and suddenly fails with a permissions error on git push or package publishing, this is likely why.
The Push-Back-to-Repository Failure
# This fails with "fatal: unable to access ... 403"
steps:
- name: Push changes
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git commit -m "Auto-format code"
git push
The default GITHUB_TOKEN doesn’t have write permissions unless you explicitly grant them:
# FIXED — grant contents:write permission
permissions:
contents: write
jobs:
format:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
- name: Push changes
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git commit -m "Auto-format code"
git push
You can set permissions at the workflow level (applies to all jobs) or at the job level. Job-level settings override workflow-level ones.
Step 5: Resolve Dependency and Build Failures
“Module Not Found” in CI but Not Locally
This usually means your lock file is out of sync with package.json, or you’re ignoring files in .gitignore that CI needs.
# Common error message
Error: Cannot find module 'some-package'
Require stack:
- /home/runner/work/repo/repo/index.js
Root cause: You installed a package locally but forgot to commit the updated package-lock.json, or you used npm install instead of npm ci.
# CORRECT — use ci for reproducible installs
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci # ← Strict install from lockfile
- run: npm run build
npm ci deletes node_modules and installs exactly what’s in the lock file. If the lock file doesn’t match package.json, it fails loudly — which is what you want.
Python: Poetry Lock Mismatches
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install Poetry
run: pip install poetry==1.8.4
- name: Install dependencies
run: poetry install --no-interaction --no-root
If this fails with a lock file error, your poetry.lock is stale. Run poetry lock --no-update locally, commit the refreshed lock file, and push.
Step 6: Handle Deprecated Actions and Commands
GitHub deprecates actions and commands on a rolling basis. If a previously working workflow suddenly starts failing after months of stability, check for deprecation notices.
The set-output Deprecation
Older workflows used this pattern:
# DEPRECATED — no longer works
- name: Set output
id: vars
run: echo "::set-output name=version::1.0.0"
# CURRENT — use GITHUB_OUTPUT
- name: Set output
id: vars
run: echo "version=1.0.0" >> $GITHUB_OUTPUT
Node 12, Node 16, and Node 20 Action Runtimes
GitHub has progressively retired Node.js runtimes for actions:
- Node 12 — deprecated January 2023
- Node 16 — deprecated September 2023
- Node 20 — current as of 2026
If you maintain a custom action, update its action.yml:
# Update this line
runs:
using: 'node20'
main: 'dist/index.js'
And in your workflow, pin to current major versions:
# GOOD — explicit versions
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- uses: actions/cache@v4
Avoid @main or @latest tags. They break without warning when maintainers push changes.
Step 7: Fix Caching Problems
Caching speeds up builds but introduces a class of failures that are hard to debug because they’re intermittent.
Corrupted Cache Causes Build Failures
# The cache restore succeeds, but the build fails afterward
- uses: actions/cache@v4
with:
path: node_modules
key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}
Problem: If node_modules gets partially written or a dependency ships a broken release, the cache stores that broken state and serves it to every subsequent run.
Solution: Include the lock file hash and a version prefix you can bump to invalidate:
- uses: actions/cache@v4
with:
path: node_modules
key: v2-${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}
restore-keys: |
v2-${{ runner.os }}-node-
When you suspect cache corruption, bump v2 to v3 to force a fresh cache build.
Cache Restore Keys Causing Wrong Dependencies
The restore-keys fallback can load a cache from a different branch or commit. If your tests pass locally but fail in CI with weird dependency behavior, disable the cache temporarily:
# Temporarily skip caching to isolate the issue
# - uses: actions/cache@v4
# with:
# path: node_modules
# key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}
Step 8: Debug Working Directory and Path Issues
Your code runs in /home/runner/work/{repo-name}/{repo-name} on Linux runners, not in your repo root. Commands that work locally can fail because of path differences.
The “File Not Found” After Checkout
# BROKEN — script doesn't exist at this path
steps:
- uses: actions/checkout@v4
- run: ./scripts/deploy.sh
If scripts/deploy.sh isn’t executable or isn’t committed (check .gitignore), this fails. Fix it:
steps:
- uses: actions/checkout@v4
- run: |
chmod +x scripts/deploy.sh
./scripts/deploy.sh
Docker Build Context Issues
# This fails if Dockerfile references paths relative to a subdirectory
- name: Build Docker image
run: docker build -t app .
If your Dockerfile is in a subdirectory:
- name: Build Docker image
run: docker build -t app -f docker/Dockerfile .
The build context (.) is always relative to your current working directory, which is the repo root unless you change it.
Step 9: Address Concurrency and Race Conditions
If your workflow sometimes passes and sometimes fails with no code changes, you likely have a race condition.
Concurrent Deployments Overwriting Each Other
# PROBLEM — multiple pushes trigger overlapping deployments
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- run: ./deploy.sh
# FIXED — cancel previous runs, queue new ones
concurrency:
group: deploy-${{ github.ref }}
cancel-in-progress: false # Queue instead of cancel for deployments
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- run: ./deploy.sh
Use cancel-in-progress: true for CI builds (safe to cancel stale runs). Use false for deployments (you don’t want to cancel a half-finished migration).
Step 10: Handle Runner and Resource Limits
Job Timeouts
The default timeout is 360 minutes (6 hours). If your job exceeds this:
jobs:
heavy-build:
runs-on: ubuntu-latest
timeout-minutes: 15 # ← Set a realistic limit
steps:
- run: ./long-running-script.sh
Setting a timeout prevents runaway jobs from burning through your action minutes quota.
API Rate Limiting
If your workflow makes many GitHub API calls (via gh CLI or octokit), you can hit rate limits:
# Error message
gh: API rate limit exceeded for installation ID 12345678.
Use the built-in GITHUB_TOKEN for API calls — it has a higher rate limit than unauthenticated requests:
steps:
- uses: actions/github-script@v7
with:
script: |
const repos = await github.rest.repos.listForOrg({
org: context.repo.owner,
per_page: 100
});
console.log(repos.data.length);
Disk Space on GitHub-Hosted Runners
The standard ubuntu-latest runner has about 14 GB of free disk space. Large Docker images or monorepo builds can exhaust this:
# Error
No space left on device
Free up space before building:
steps:
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
df -h
Or use a larger runner (ubuntu-latest-4-cores, ubuntu-latest-16-cores) if your workflow genuinely needs more resources.
Step 11: Debug with Re-run and SSH Access
Re-run with Debug Logging
From the workflow run page, click Re-run all jobs > Enable debug logging. This requires the ACTIONS_RUNNER_DEBUG secret set to true in your repo settings.
SSH Into a Failed Runner
For complex failures, you can pause the runner and SSH in:
“`yaml
steps:
– uses: actions/checkout@v4
– name: Setup SSH debugging
uses: mxschmitt/action-tmate@v3
with