Troubleshooting
When something doesn’t work, the events page is usually your first stop — most failures surface there as a warning. Below are the issues operators hit most, each with its cause and fix.
permission denied connecting to the Docker daemon socket
Section titled “permission denied connecting to the Docker daemon socket”Cause. The agent’s group ID doesn’t match /var/run/docker.sock on the host, so it can’t talk to
the Docker daemon.
Fix. On the host, either add the agent’s user to the host’s docker group, or loosen the socket’s
group permissions:
sudo chmod g+rw /var/run/docker.sockSign-in succeeds but bounces back to /login
Section titled “Sign-in succeeds but bounces back to /login”Cause. BETTER_AUTH_URL doesn’t match the URL in your browser. Session cookies are scoped to the
auth URL’s origin, so a mismatch means the cookie is never sent back.
Fix. Set BETTER_AUTH_URL to the exact user-facing https:// URL and restart the hub. See
TLS and reverse proxy.
Runner containers don’t appear
Section titled “Runner containers don’t appear”Cause. Usually a GitHub problem (the PAT is missing a scope or you’re rate-limited) or the scale set has backed off after repeated failures.
Fix. Check the events page for github warnings and open the
scale set to see its backoff state. A backed-off scale set retries on its own; fix the underlying
GitHub error (see PAT issues below) and it recovers.
Jobs fail with Cannot find module 'node:path' (or similar Node-stdlib errors)
Section titled “Jobs fail with Cannot find module 'node:path' (or similar Node-stdlib errors)”Cause. The cached runner image is stale — an old actions/runner that predates working Node
externals.
Fix. Open the scale set’s settings and set Image pull to Always or TtlHours. The next
reconcile pass refreshes the layer and every spawn after runs on a current image. See
Scale sets and
Registry credentials.
Private-registry pull fails with unauthorized
Section titled “Private-registry pull fails with unauthorized”Cause. The runner image lives in a private registry and the hub has no credential for it.
Fix. Add an image credential and attach it to the scale set’s image-pull credential select. See Registry credentials.
docker build / docker run fail inside a new scale set
Section titled “docker build / docker run fail inside a new scale set”Cause. New scale sets don’t mount the Docker socket into runners by default, so jobs that build or run containers have no daemon to talk to.
Fix. Edit the scale set and tick Mount Docker socket inside runners. Existing scale sets keep their previous behavior and aren’t affected.
Container exits immediately with a lock message
Section titled “Container exits immediately with a lock message”Cause. Another container is holding the data-volume lock. SQLite is single-writer.
Fix. Run exactly one container per data volume. If you started a second one, stop it.
A host shows offline
Section titled “A host shows offline”Cause. The agent isn’t connected — it’s stopped, the machine is down, or its token was revoked.
Fix. Restart the agent on the host and confirm it can reach the hub. If you revoked its token, mint a fresh installer URL and re-enroll (re-enrollment is non-destructive of the data volume). See Adding hosts.
A PAT expired or lost scopes
Section titled “A PAT expired or lost scopes”Cause. GitHub tokens expire or get edited, which surfaces as github warnings on the events page
and stalls the affected org’s runners.
Fix. Re-enter the PAT for that org. See GitHub setup for the required scopes.