Troubleshooting GitLab chart development environment
All steps noted here are for DEVELOPMENT ENVIRONMENTS ONLY. Administrators may find the information insightful, but the outlined fixes are destructive and would have a major negative impact on production systems.
Passwords and secrets failing or unsynchronized
Developers commonly deploy, delete, and re-deploy a release into the same
cluster multiple times. Kubernetes secrets and persistent volume claims created by StatefulSets are
intentionally not removed by helm delete RELEASE_NAME
.
Removing only the Kubernetes secrets leads to interesting problems. For example, a new deployment’s migration pod will fail because GitLab Rails cannot connect to the database because it has the wrong password.
To completely wipe a release from a development environment including secrets, a developer must remove both the secrets and the persistent volume claims.
# DO NOT run these commands in a production environment. Disaster will strike.
kubectl delete secrets,pvc -lrelease=RELEASE_NAME
Database is broken and needs reset
The database environment can be reset in a development environment by:
- Delete the PostgreSQL StatefulSet
- Delete the PostgreSQL PersistentVolumeClaim
- Deploy GitLab again with
helm upgrade --install
Backup used for testing needs to be updated
Certain jobs in CI use a backup of GitLab during testing. Complete the steps below to update this backup when needed:
- Generate the desired backup by running a CI pipeline for the matching stable branch.
- For example: run a CI pipeline for branch
5-4-stable
if current release is5-5-stable
to create a backup of 14.4. - Note that this will require the Maintainer role.
- For example: run a CI pipeline for branch
- In that pipeline, cancel the QA jobs (but leave the spec tests) so that we don’t get extra data in the backup.
- Let the spec tests finish. They will have installed the old backup, and migrated the instance to the version we want.
- Edit the
gitlab-runner
Deployment replicas to 0, so the Runner turns off. - Log in to the UI and delete the Runner from the admin section. This should help avoid cipher errors later.
- Ensure the background migrations all complete, forcing them to complete if needed.
- Delete the
toolbox
Pod to ensure there is no existingtmp
data, keeping the backup small. - If any manual work is needed to modify the contents of the backup, complete it before moving on to the next step.
-
Create a new backup from the new
toolbox
Pod. - Download the new backup from the CI instance of MinIO in the
gitlab-backups
bucket. - Rename and upload the backup to the proper location in Google Cloud Storage (GCS):
- Project:
cloud-native-182609
, path:gitlab-charts-ci/test-backups/
- Name format:
$VERSION_gitlab_backup.tar
(example:14.4.2_gitlab_backup.tar
) - Edit access and add
Entity=Public
,Name=allUsers
, andAccess=Reader
.
- Project:
- Finally, update
.variables.TEST_BACKUP_PREFIX
in.gitlab-ci.yml
to the new version of the backup.
Future pipelines will now use the new backup artifact during testing.