Troubleshooting GitLab Runner

Tier: Free, Premium, Ultimate
Offering: GitLab.com, Self-managed

This section can assist when troubleshooting GitLab Runner.

note
A Critical Security release will reset runner registration tokens for your group and projects. If you use an automated process (scripts that encode the value of the registration token) to register runners, this update will break that process. However, it should have no effect on previously registered runners.

General troubleshooting tips

View the logs:

  • tail -100 /var/log/syslog (Debian)
  • tail -100 /var/log/messages (RHEL)
  • docker logs gitlab-runner-container (Docker)
  • kubectl logs gitlab-runner-pod (Kubernetes)

Restart the service:

  • service gitlab-runner restart

View the Docker machines:

  • sudo docker-machine ls
  • sudo su - && docker-machine ls

Delete all Docker machines:

  • docker-machine rm $(docker-machine ls -q)

After making changes to your config.toml:

  • service gitlab-runner restart
  • docker-machine rm $(docker-machine ls -q)
  • tail -f /var/log/syslog (Debian)
  • tail -f /var/log/messages (RHEL)

Confirm your GitLab and GitLab Runner versions

GitLab aims to guarantee backward compatibility. However, as a first troubleshooting step, you should ensure your version of GitLab Runner is the same as your GitLab version.

What does coordinator mean?

The coordinator is the GitLab installation from which a job is requested.

In other words, runner is an isolated agent that request jobs from the coordinator (GitLab installation through GitLab API).

Where are logs stored when run as a service on Windows?

  • If GitLab Runner is running as a service on Windows, it creates system event logs. To view them, open the Event Viewer (from the Run menu, type eventvwr.msc or search for “Event Viewer”). Then go to Windows Logs > Application. The Source for Runner logs is gitlab-runner. If you are using Windows Server Core, run this PowerShell command to get the last 20 log entries: get-eventlog Application -Source gitlab-runner -Newest 20 | format-table -wrap -auto.

Enable debug logging mode

caution
Debug logging can be a serious security risk. The output contains the content of all variables and other secrets available to the job. You should disable any log aggregation that might transmit secrets to third parties. The use of masked variables allows secrets to be protected in job log output, but not in container logs.

In the command line

From a terminal, logged in as root, run:

gitlab-runner stop
gitlab-runner --debug run

In the GitLab Runner config.toml

Debug logging can be enabled in the global section of the config.toml by setting the log_level setting to debug. Add the following line at the very top of your config.toml, before/after the concurrent line:

log_level = "debug"

In the Helm Chart

If GitLab Runner was installed in a Kubernetes cluster by using the GitLab Runner Helm Chart, you can enable debug logging by setting the logLevel option in the values.yaml customization:

## Configure the GitLab Runner logging level. Available values are: debug, info, warn, error, fatal, panic
## ref: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
##
logLevel: debug

Configure DNS for a Docker executor runner

When configuring a GitLab Runner with the Docker executor, it is possible to run into a problem where the Runner daemon on the host can access GitLab but the built container cannot. This can happen when DNS is configured in the host but those configurations are not passed to the container.

Example:

GitLab service and GitLab Runner exist in two different networks that are bridged in two ways (for example, over the Internet and through a VPN). If the routing mechanism that the Runner uses to find the GitLab service queries DNS, the container’s DNS configuration doesn’t know to use the DNS service over the VPN and may default to the one provided over the Internet. This configuration would result in the following message:

Created fresh repository.
++ echo 'Created fresh repository.'
++ git -c 'http.userAgent=gitlab-runner 16.5.0 linux/amd64' fetch origin +da39a3ee5e6b4b0d3255bfef95601890afd80709:refs/pipelines/435345 +refs/heads/master:refs/remotes/origin/master --depth 50 --prune --quiet
fatal: Authentication failed for 'https://gitlab.example.com/group/example-project.git/'

In this case, the authentication failure is caused by a service in between the Internet and the GitLab service. This service uses separate credentials, which the runner could circumvent if they used the DNS service over the VPN.

You can tell Docker which DNS server to use by using the dns configuration in the [runners.docker] section of the Runner’s config.toml file.

dns = ["192.168.xxx.xxx","192.168.xxx.xxx"]

I’m seeing x509: certificate signed by unknown authority

Please see the self-signed certificates.

I get Permission Denied when accessing the /var/run/docker.sock

If you want to use Docker executor, and you are connecting to Docker Engine installed on server. You can see the Permission Denied error. The most likely cause is that your system uses SELinux (enabled by default on CentOS, Fedora and RHEL). Check your SELinux policy on your system for possible denials.

Docker-machine error: Unable to query docker version: Cannot connect to the docker engine endpoint.

This error relates to machine provisioning and might be due to the following reasons:

  • There is a TLS failure. When docker-machine is installed, some certificates might be invalid. To resolve this issue, remove the certificates and restart the runner:

    sudo su -
    rm -r /root/.docker/machine/certs/*
    service gitlab-runner restart
    

    After the runner restarts, it registers that the certificates are empty and recreates them.

  • The hostname is longer than the supported length in the provisioned machine. For example, Ubuntu machines have a 64 character limit for HOST_NAME_MAX. The hostname is reported by docker-machine ls. Check the MachineName in the runner configuration and reduce the hostname length if required.

note
This error might have occurred before Docker was installed in the machine.

dialing environment connection: ssh: rejected: connect failed (open failed)

This error can be produced when the Docker autoscaler is unable to connect to the Docker daemon on the target system, when the connection is tunneled through SSH. Ensure that you can SSH to the target system and successfully run Docker commands, for example docker info.

Adding an AWS Instance Profile to your autoscaled runners

After you create an AWS IAM Role, in your IAM console, the role has a Role ARN and a Instance Profile ARNs. You must use the Instance Profile name, not the Role Name.

Add the following value to your [runners.machine] section: "amazonec2-iam-instance-profile=<instance-profile-name>",

The Docker executor gets timeout when building Java project

This most likely happens, because of the broken AUFS storage driver: Java process hangs on inside container. The best solution is to change the storage driver to either OverlayFS (faster) or DeviceMapper (slower).

Check this article about configuring and running Docker or this article about control and configure with systemd.

I get 411 when uploading artifacts

This happens due to fact that GitLab Runner uses Transfer-Encoding: chunked which is broken on early version of NGINX (https://serverfault.com/questions/164220/is-there-a-way-to-avoid-nginx-411-content-length-required-errors).

Upgrade your NGINX to newer version. For more information see this issue: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/1031

No URL provided, cache will not be download/uploaded

This occurs when caching is configured for the job, but the GitLab Runner helper does not have any pre-signed URL to access a remote cache, or an invalid URL. Review each cache-related config.toml entry and provider-specific keys and values. An invalid URL might be constructed from any item that does not follow the URL syntax requirements.

Additionally, ensure that your helper image and helper_image_flavor match and are up-to-date.

If there is a problem with the credentials configuration, a diagnostic error message is added to the GitLab Runner process log.

Error: warning: You appear to have cloned an empty repository.

When running git clone using HTTP(s) (with GitLab Runner or manually for tests) and you see the following output:

$ git clone https://git.example.com/user/repo.git

Cloning into 'repo'...
warning: You appear to have cloned an empty repository.

Make sure, that the configuration of the HTTP Proxy in your GitLab server installation is done properly. Especially if you are using some HTTP Proxy with its own configuration, make sure that GitLab requests are proxied to the GitLab Workhorse socket, not to the GitLab Unicorn socket.

Git protocol via HTTP(S) is resolved by the GitLab Workhorse, so this is the main entrypoint of GitLab.

If you are using a Linux package installation, but don’t want to use the bundled NGINX server, please read using a non-bundled web-server.

In the GitLab Recipes repository there are web-server configuration examples for Apache and NGINX.

If you are using GitLab installed from source, please also read the above documentation and examples, and make sure that all HTTP(S) traffic is going through the GitLab Workhorse.

See an example of a user issue.

Error: zoneinfo.zip: no such file or directory error when using Timezone or OffPeakTimezone

It’s possible to configure the time zone in which [[docker.machine.autoscaling]] periods are described. This feature should work on most Unix systems out of the box. However on some Unix systems, and probably on most non-Unix systems (including Windows, for which we’re providing GitLab Runner binaries), when used, the runner will crash at start with an error similar to:

Failed to load config Invalid OffPeakPeriods value: open /usr/local/go/lib/time/zoneinfo.zip: no such file or directory

The error is caused by the time package in Go. Go uses the IANA Time Zone database to load the configuration of the specified time zone. On most Unix systems, this database is already present on one of well-known paths (/usr/share/zoneinfo, /usr/share/lib/zoneinfo, /usr/lib/locale/TZ/). Go’s time package looks for the Time Zone database in all those three paths. If it doesn’t find any of them, but the machine has a configured Go development environment, then it will fallback to the $GOROOT/lib/time/zoneinfo.zip file.

If none of those paths are present (for example on a production Windows host) the above error is thrown.

In case your system has support for the IANA Time Zone database, but it’s not available by default, you can try to install it. For Linux systems it can be done for example by:

# on Debian/Ubuntu based systems
sudo apt-get install tzdata

# on RPM based systems
sudo yum install tzdata

# on Linux Alpine
sudo apk add -U tzdata

If your system doesn’t provide this database in a native way, then you can make OffPeakTimezone working by following the steps below:

  1. Downloading the zoneinfo.zip. Starting with version v9.1.0 you can download the file from a tagged path. In that case you should replace latest with the tag name (e.g., v9.1.0) in the zoneinfo.zip download URL.

  2. Store this file in a well known directory. We’re suggesting to use the same directory where the config.toml file is present. So for example, if you’re hosting Runner on Windows machine and your configuration file is stored at C:\gitlab-runner\config.toml, then save the zoneinfo.zip at C:\gitlab-runner\zoneinfo.zip.

  3. Set the ZONEINFO environment variable containing a full path to the zoneinfo.zip file. If you are starting the Runner using the run command, then you can do this with:

    ZONEINFO=/etc/gitlab-runner/zoneinfo.zip gitlab-runner run <other options ...>
    

    or if using Windows:

    C:\gitlab-runner> set ZONEINFO=C:\gitlab-runner\zoneinfo.zip
    C:\gitlab-runner> gitlab-runner run <other options ...>
    

    If you are starting GitLab Runner as a system service then you will need to update/override the service configuration in a way that is provided by your service manager software (unix systems) or by adding the ZONEINFO variable to the list of environment variables available for the GitLab Runner user through System Settings (Windows).

Why can’t I run more than one instance of GitLab Runner?

You can, but not sharing the same config.toml file.

Running multiple instances of GitLab Runner using the same configuration file can cause unexpected and hard-to-debug behavior. Only a single instance of GitLab Runner can use a specific config.toml file at one time.

Job failed (system failure): preparing environment:

This error is often due to your shell loading your profile, and one of the scripts is causing the failure.

Example of dotfiles that are known to cause failure:

  • .bash_logout
  • .condarc
  • .rvmrc

SELinux can also be the culprit of this error. You can confirm this by looking at the SELinux audit log:

sealert -a /var/log/audit/audit.log

Runner abruptly terminates after Cleaning up stage

CrowdStrike Falcon Sensor has been reported to kill pods after the Cleaning up files stage of a job when the “container drift detection” setting was enabled. To ensure that jobs are able to complete, you must disable this setting.

Job fails with remote error: tls: bad certificate (exec.go:71:0s)

This error can occur when the system time changes significantly during a job that creates artifacts. Due to the change in system time, SSL certificates are expired, which causes an error when the runner attempts to uploads artifacts.

To ensure SSL verification can succeed during artifact upload, change the system time back to a valid date and time at the end of the job.
Because the creation time of the artifacts file has also changed, they are automatically archived.

Helm Chart: ERROR .. Unauthorized

Before uninstalling or upgrading runners deployed with Helm, pause them in GitLab and wait for any jobs to complete.

If you remove a runner pod with helm uninstall or helm upgrade while a job is running, Unauthorized errors like the following may occur when the job completes:

ERROR: Error cleaning up pod: Unauthorized
ERROR: Error cleaning up secrets: Unauthorized
ERROR: Job failed (system failure): Unauthorized

This probably occurs because when the runner is removed, the role bindings are removed. The runner pod continues until the job completes, and then the runner tries to delete it. Without the role binding, the runner pod no longer has access.

See this issue for details.

Elasticsearch service container startup error max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

Elasticsearch has a vm.max_map_count requirement that has to be set on the instance on which Elasticsearch is run.

See the Elasticsearch Docs for how to set this value correctly depending on the platform.

Preparing the "docker+machine" executor ERROR: Preparation failed: exit status 1 Will be retried in 3s

This error can occur when the Docker machine is not able to successfully create the executor virtual machines. To get more information about the error, manually create the virtual machine with the same MachineOptions that you have defined in your config.toml.

For example: docker-machine create --driver=google --google-project=GOOGLE-PROJECT-ID --google-zone=GOOGLE-ZONE ....