Troubleshooting

When something breaks, the events page is your first stop — most failures surface there as a warning. Below are the issues operators hit most.

`permission denied` connecting to the Docker daemon socket

Cause. The agent’s group ID doesn’t match /var/run/docker.sock on the host, so it can’t talk to the Docker daemon.

Fix. On the host, either add the agent’s user to the host’s docker group, or loosen the socket’s group permissions:

sudo chmod g+rw /var/run/docker.sock

Sign-in succeeds but bounces back to `/login`

Cause. The auth origin doesn’t match the URL in your browser. Session cookies are scoped to that origin, so a mismatch means the cookie is never sent back.

Fix. Set RUNAWAY_HUB_URL to the exact user-facing https:// URL and restart the hub. If the browser reaches the hub at a different origin than the canonical URL, set RUNAWAY_APP_URL to that origin instead. See TLS and reverse proxy.

Runner containers don’t appear

Cause. Usually a GitHub problem (the PAT is missing a scope or you’re rate-limited) or the workload has backed off after repeated failures.

Fix. Check the events page for github warnings and open the workload to see its backoff state. A backed-off workload retries on its own; fix the underlying GitHub error (see PAT issues below) and it recovers.

Jobs fail with `Cannot find module 'node:path'` (or similar Node-stdlib errors)

Cause. The cached runner image is stale — an old actions/runner that predates working Node externals.

Fix. Open the runner profile’s settings and set Image pull to Always or TtlHours. The next reconcile pass refreshes the layer and every spawn after runs on a current image. See Runner profiles and Registry credentials.

Private-registry pull fails with `unauthorized`

Cause. The runner image lives in a private registry and the hub has no credential for it.

Fix. Add an image credential and attach it to the runner profile’s image-pull credential select. See Registry credentials.

`docker build` / `docker run` fail inside a workload

Cause. The workload’s runtime is none (the default), so runners have no Docker daemon to talk to.

Fix. Edit the workload’s runtime: shared-daemon shares the host’s Docker daemon, or a nested tier (dind, isolated-sysbox, isolated-kata) gives each runner its own inner daemon. Runtime changes apply to new spawns; in-flight runners finish on their current runtime.

Container exits immediately with a lock message

Cause. Another container is holding the data-volume lock. SQLite is single-writer.

Fix. Run exactly one container per data volume. If you started a second one, stop it.

A host shows offline

Cause. The agent isn’t connected — it’s stopped, the machine is down, or its token was revoked.

Fix. Restart the agent on the host and confirm it can reach the hub. If you revoked its token, mint a fresh installer URL and re-enroll (re-enrollment is non-destructive of the data volume). See Adding hosts.

A host won’t reconnect after the hub was reinstalled

Cause. The agent persists its identity — hub URL, agent token, and host id — to /data/agent.json inside the runaway-agent-data volume, and reuses it on every boot. If you reinstalled the hub or it lost its database, that saved token belongs to a hub that no longer exists, so the new hub can’t adopt the agent. Re-running the installer doesn’t help: the agent prefers the saved identity and ignores the fresh enrollment token while the volume is intact.

Fix. Wipe the stale identity on the host, then enroll fresh from the new hub. Remove the agent and its data volume:

docker rm -f runaway-agent
docker volume rm runaway-agent-data

If the old install left runner containers behind, clear them too — they carry the previous install’s labels and the new hub will never adopt them:

docker rm -f $(docker ps -aq --filter "label=managed-by=runaway") 2>/dev/null || true

Then add the host on the new hub and run the install command it gives you. See Adding hosts.

A PAT expired or lost scopes

Cause. GitHub tokens expire or get edited, which surfaces as github warnings on the events page and stalls the affected org’s runners.

Fix. Re-enter the PAT for that org. See GitHub setup for the required scopes.

Report a problem with a diagnostics bundle

When a problem outlives the fixes above, download a diagnostics bundle and attach it to your issue instead of copying the events table by hand. Settings → Diagnostics → Download diagnostics, or use the Download diagnostics button on the events page.

The bundle is a single JSON file holding this install’s identity and versions, its hosts, workloads, and organizations, recent runners and jobs, and the last 24 hours of events — enough to reconstruct your setup and the failure timeline.

Secrets never enter it: credentials, registry passwords, and webhook secrets are left out entirely, customEnv values are masked, and the event log is scrubbed of token-shaped strings. It’s plain JSON, so you can read the whole file before sharing it.