Back up and restore large reference architectures
This document describes how to:
- Configure daily backups
- Take a backup now (planned)
- Restore a backup
This document is intended for environments using:
- Linux package (Omnibus) and cloud-native hybrid reference architectures 60 RPS / 3,000 users and up
- Amazon RDS for PostgreSQL data
- Amazon S3 for object storage
- Object storage to store everything possible, including blobs and container registry
Configure daily backups
Configure backup of PostgreSQL data
The backup command uses pg_dump
, which is not appropriate for databases over 100 GB. You must choose a PostgreSQL solution which has native, robust backup capabilities.
- Configure AWS Backup to back up RDS (and S3) data. For maximum protection, configure continuous backups as well as snapshot backups.
- Configure AWS Backup to copy backups to a separate region. When AWS takes a backup, the backup can only be restored in the region the backup is stored.
- After AWS Backup has run at least one scheduled backup, then you can create an on-demand backup as needed.
Schedule automated daily backups of Google Cloud SQL data. Daily backups can be retained for up to one year, and transaction logs can be retained for 7 days by default for point-in-time recovery.
Configure backup of object storage data
Object storage, (not NFS) is recommended for storing GitLab data, including blobs and Container registry.
Configure AWS Backup to back up S3 data. This can be done at the same time when configuring the backup of PostgreSQL data.
- Create a backup bucket in GCS.
-
Create Storage Transfer Service jobs which copy each GitLab object storage bucket to a backup bucket. You can create these jobs once, and schedule them to run daily. However this mixes new and old object storage data, so files that were deleted in GitLab will still exist in the backup. This wastes storage after restore, but it is otherwise not a problem. These files would be inaccessible to GitLab users since they do not exist in the GitLab database. You can delete some of these orphaned files after restore, but this clean up Rake task only operates on a subset of files.
- For
When to overwrite
, chooseNever
. GitLab object stored files are intended to be immutable. This selection could be helpful if a malicious actor succeeded at mutating GitLab files. - For
When to delete
, chooseNever
. If you sync the backup bucket to source, then you cannot recover if files are accidentally or maliciously deleted from source.
- For
-
Alternatively, it is possible to backup object storage into buckets or subdirectories segregated by day. This avoids the problem of orphaned files after restore, and supports backup of file versions if needed. But it greatly increases backup storage costs. This can be done with a Cloud Function triggered by Cloud Scheduler, or with a script run by a cronjob. A partial example:
# Set GCP project so you don't have to specify it in every command gcloud config set project example-gcp-project-name # Grant the Storage Transfer Service's hidden service account permission to write to the backup bucket. The integer 123456789012 is the GCP project's ID. gsutil iam ch serviceAccount:project-123456789012@storage-transfer-service.iam.gserviceaccount.com:roles/storage.objectAdmin gs://backup-bucket # Grant the Storage Transfer Service's hidden service account permission to list and read objects in the source buckets. The integer 123456789012 is the GCP project's ID. gsutil iam ch serviceAccount:project-123456789012@storage-transfer-service.iam.gserviceaccount.com:roles/storage.legacyBucketReader,roles/storage.objectViewer gs://gitlab-bucket-artifacts gsutil iam ch serviceAccount:project-123456789012@storage-transfer-service.iam.gserviceaccount.com:roles/storage.legacyBucketReader,roles/storage.objectViewer gs://gitlab-bucket-ci-secure-files gsutil iam ch serviceAccount:project-123456789012@storage-transfer-service.iam.gserviceaccount.com:roles/storage.legacyBucketReader,roles/storage.objectViewer gs://gitlab-bucket-dependency-proxy gsutil iam ch serviceAccount:project-123456789012@storage-transfer-service.iam.gserviceaccount.com:roles/storage.legacyBucketReader,roles/storage.objectViewer gs://gitlab-bucket-lfs gsutil iam ch serviceAccount:project-123456789012@storage-transfer-service.iam.gserviceaccount.com:roles/storage.legacyBucketReader,roles/storage.objectViewer gs://gitlab-bucket-mr-diffs gsutil iam ch serviceAccount:project-123456789012@storage-transfer-service.iam.gserviceaccount.com:roles/storage.legacyBucketReader,roles/storage.objectViewer gs://gitlab-bucket-packages gsutil iam ch serviceAccount:project-123456789012@storage-transfer-service.iam.gserviceaccount.com:roles/storage.legacyBucketReader,roles/storage.objectViewer gs://gitlab-bucket-pages gsutil iam ch serviceAccount:project-123456789012@storage-transfer-service.iam.gserviceaccount.com:roles/storage.legacyBucketReader,roles/storage.objectViewer gs://gitlab-bucket-registry gsutil iam ch serviceAccount:project-123456789012@storage-transfer-service.iam.gserviceaccount.com:roles/storage.legacyBucketReader,roles/storage.objectViewer gs://gitlab-bucket-terraform-state gsutil iam ch serviceAccount:project-123456789012@storage-transfer-service.iam.gserviceaccount.com:roles/storage.legacyBucketReader,roles/storage.objectViewer gs://gitlab-bucket-uploads # Create transfer jobs for each bucket, targeting a subdirectory in the backup bucket. today=$(date +%F) gcloud transfer jobs create gs://gitlab-bucket-artifacts/ gs://backup-bucket/$today/artifacts/ --name "$today-backup-artifacts" gcloud transfer jobs create gs://gitlab-bucket-ci-secure-files/ gs://backup-bucket/$today/ci-secure-files/ --name "$today-backup-ci-secure-files" gcloud transfer jobs create gs://gitlab-bucket-dependency-proxy/ gs://backup-bucket/$today/dependency-proxy/ --name "$today-backup-dependency-proxy" gcloud transfer jobs create gs://gitlab-bucket-lfs/ gs://backup-bucket/$today/lfs/ --name "$today-backup-lfs" gcloud transfer jobs create gs://gitlab-bucket-mr-diffs/ gs://backup-bucket/$today/mr-diffs/ --name "$today-backup-mr-diffs" gcloud transfer jobs create gs://gitlab-bucket-packages/ gs://backup-bucket/$today/packages/ --name "$today-backup-packages" gcloud transfer jobs create gs://gitlab-bucket-pages/ gs://backup-bucket/$today/pages/ --name "$today-backup-pages" gcloud transfer jobs create gs://gitlab-bucket-registry/ gs://backup-bucket/$today/registry/ --name "$today-backup-registry" gcloud transfer jobs create gs://gitlab-bucket-terraform-state/ gs://backup-bucket/$today/terraform-state/ --name "$today-backup-terraform-state" gcloud transfer jobs create gs://gitlab-bucket-uploads/ gs://backup-bucket/$today/uploads/ --name "$today-backup-uploads"
- These Transfer Jobs are not automatically deleted after running. You could implement clean up of old jobs in the script.
- The example script does not delete old backups. You could implement clean up of old backups according to your desired retention policy.
- Ensure that backups are performed at the same time or later than Cloud SQL backups, to reduce data inconsistencies.
Configure backup of Git repositories
Set up cronjobs to perform Gitaly server-side backups:
- Configure a server-side backup destination in all Gitaly nodes.
- Configure Upload backups to a remote (cloud) storage. Even though Gitaly backs up all Git data to its own object storage bucket, the
gitlab-backup
command also creates atar
file containing backup metadata. Thistar
file is required by the restore command. - Make sure to add both buckets to backups of object storage data.
- SSH into a GitLab Rails node, which is a node that runs Puma or Sidekiq.
-
Take a full backup of your Git data. Use the
REPOSITORIES_SERVER_SIDE
variable, and skip PostgreSQL data:sudo gitlab-backup create REPOSITORIES_SERVER_SIDE=true SKIP=db
This causes Gitaly nodes to upload the Git data and some metadata to remote storage. Blobs such as uploads, artifacts, and LFS do not need to be explicitly skipped, because the
gitlab-backup
command does not back up object storage by default. - Note the backup ID of the backup, which is needed for the next step. For example, if the backup command outputs
2024-02-22 02:17:47 UTC -- Backup 1708568263_2024_02_22_16.9.0-ce is done.
, then the backup ID is1708568263_2024_02_22_16.9.0-ce
. - Check that the full backup created data in both the Gitaly backup bucket as well as the regular backup bucket.
-
Run the backup command again, this time specifying incremental backup of Git repositories, and a backup ID. Using the example ID from the previous step, the command is:
sudo gitlab-backup create REPOSITORIES_SERVER_SIDE=true SKIP=db INCREMENTAL=yes PREVIOUS_BACKUP=1708568263_2024_02_22_16.9.0-ce
The value of
PREVIOUS_BACKUP
is not used by this command, but it is required by the command. There is an issue for removing this unnecessary requirement, see issue 429141. - Check that the incremental backup succeeded, and added data to object storage.
-
Configure cron to make daily backups. Edit the crontab for the
root
user:sudo su - crontab -e
-
There, add the following lines to schedule the backup for everyday of every month at 2 AM. To limit the number of increments needed to restore a backup, a full backup of Git repositories will be taken on the first of each month, and the rest of the days will take an incremental backup.:
0 2 1 * * /opt/gitlab/bin/gitlab-backup create REPOSITORIES_SERVER_SIDE=true SKIP=db CRON=1 0 2 2-31 * * /opt/gitlab/bin/gitlab-backup create REPOSITORIES_SERVER_SIDE=true SKIP=db INCREMENTAL=yes PREVIOUS_BACKUP=1708568263_2024_02_22_16.9.0-ce CRON=1
- Configure a server-side backup destination in all Gitaly nodes.
-
Configure Object storage buckets for backup-utility. Even though Gitaly backs up all Git data to its own object storage bucket, the
backup-utility
command also creates atar
file containing backup metadata. Thistar
file is required by the restore command. - Make sure to add both buckets to backups of object storage data.
-
Take a full backup of your Git data. Use the
--repositories-server-side
option, and skip all other data:kubectl exec <Toolbox pod name> -it -- backup-utility --repositories-server-side --skip db,builds,pages,registry,uploads,artifacts,lfs,packages,external_diffs,terraform_state,pages,ci_secure_files
This causes Gitaly nodes to upload the Git data and some metadata to remote storage. See Toolbox included tools.
- Check that the full backup created data in both the Gitaly backup bucket as well as the regular backup bucket. Incremental repository backup is not supported by
backup-utility
with server-side repository backup, see charts issue 3421. -
Configure cron to make daily backups. Specifically, set
gitlab.toolbox.backups.cron.extraArgs
to include:--repositories-server-side --skip db --skip repositories --skip uploads --skip builds --skip artifacts --skip pages --skip lfs --skip terraform_state --skip registry --skip packages --skip ci_secure_files
Configure backup of configuration files
If your configuration and secrets are defined outside of your deployment and then deployed into it, then the implementation of the backup strategy depends on your specific setup and requirements. As an example, you can store secrets in AWS Secret Manager with replication to multiple regions and configure a script to back up secrets automatically.
If your configuration and secrets are only defined inside your deployment:
- Storing configuration files describes how to extract configuration and secrets files.
- These files should be uploaded to a separate, more restrictive, object storage account.
Restore a backup
Restore a backup of a GitLab instance.
Prerequisites
Before restoring a backup:
- Choose a working destination GitLab instance.
- Ensure the destination GitLab instance is in a region where your AWS backups are stored.
- Check that the destination GitLab instance uses exactly the same version and type (CE or EE) of GitLab on which the backup data was created. For example, CE 15.1.4.
- Restore backed up secrets to the destination GitLab instance.
- Ensure that the destination GitLab instance has the same repository storages configured. Additional storages are fine.
- Ensure that object storage is configured.
-
To use new secrets or configuration, and to avoid dealing with any unexpected configuration changes during restore:
- Linux package installations on all nodes:
- Reconfigure the destination GitLab instance.
- Restart the destination GitLab instance.
-
Helm chart (Kubernetes) installations:
-
On all GitLab Linux package nodes, run:
sudo gitlab-ctl reconfigure sudo gitlab-ctl start
-
Make sure you have a running GitLab instance by deploying the charts. Ensure the Toolbox pod is enabled and running by executing the following command:
kubectl get pods -lrelease=RELEASE_NAME,app=toolbox
-
The Webservice, Sidekiq and Toolbox pods must be restarted. The safest way to restart those pods is to run:
kubectl delete pods -lapp=sidekiq,release=<helm release name> kubectl delete pods -lapp=webservice,release=<helm release name> kubectl delete pods -lapp=toolbox,release=<helm release name>
-
- Linux package installations on all nodes:
-
Confirm the destination GitLab instance still works. For example:
- Make requests to the health check endpoints.
- Run GitLab check Rake tasks.
-
Stop GitLab services which connect to the PostgreSQL database.
-
Linux package installations on all nodes running Puma or Sidekiq, run:
sudo gitlab-ctl stop
-
Helm chart (Kubernetes) installations:
-
Note the current number of replicas for database clients for subsequent restart:
kubectl get deploy -n <namespace> -lapp=sidekiq,release=<helm release name> -o jsonpath='{.items[].spec.replicas}{"\n"}' kubectl get deploy -n <namespace> -lapp=webservice,release=<helm release name> -o jsonpath='{.items[].spec.replicas}{"\n"}' kubectl get deploy -n <namespace> -lapp=prometheus,release=<helm release name> -o jsonpath='{.items[].spec.replicas}{"\n"}'
-
Stop the clients of the database to prevent locks interfering with the restore process:
kubectl scale deploy -lapp=sidekiq,release=<helm release name> -n <namespace> --replicas=0 kubectl scale deploy -lapp=webservice,release=<helm release name> -n <namespace> --replicas=0 kubectl scale deploy -lapp=prometheus,release=<helm release name> -n <namespace> --replicas=0
-
-
Restore object storage data
Each bucket exists as a separate backup within AWS and each backup can be restored to an existing or new bucket.
-
To restore buckets, an IAM role with the correct permissions is required:
-
AWSBackupServiceRolePolicyForBackup
-
AWSBackupServiceRolePolicyForRestores
-
AWSBackupServiceRolePolicyForS3Restore
-
AWSBackupServiceRolePolicyForS3Backup
-
- If existing buckets are being used, they must have Access Control Lists enabled.
- Restore the S3 buckets using built-in tooling.
- You can move on to Restore PostgreSQL data while the restore job is running.
- Create Storage Transfer Service jobs to transfer backed up data to the GitLab buckets.
- You can move on to Restore PostgreSQL data while the transfer jobs are running.
Restore PostgreSQL data
- Restore the AWS RDS database using built-in tooling, which creates a new RDS instance.
-
Because the new RDS instance has a different endpoint, you must reconfigure the destination GitLab instance to point to the new database:
-
For Linux package installations, follow Using a non-packaged PostgreSQL database management server.
-
For Helm chart (Kubernetes) installations, follow Configure the GitLab chart with an external database.
-
- Before moving on, wait until the new RDS instance is created and ready to use.
- Restore the Google Cloud SQL database using built-in tooling.
-
If you restore to a new database instance, then reconfigure GitLab to point to the new database:
-
For Linux package installations, follow Using a non-packaged PostgreSQL database management server.
-
For Helm chart (Kubernetes) installations, follow Configure the GitLab chart with an external database.
-
- Before moving on, wait until the Cloud SQL instance is ready to use.
Restore Git repositories
First, as part of Restore object storage data, you should have already:
- Restored a bucket containing the Gitaly server-side backups of Git repositories.
- Restored a bucket containing the
*_gitlab_backup.tar
files.
- SSH into a GitLab Rails node, which is a node that runs Puma or Sidekiq.
- In your backup bucket, choose a
*_gitlab_backup.tar
file based on its timestamp, aligned with the PostgreSQL and object storage data that you restored. - Download the
tar
file in/var/opt/gitlab/backups/
. -
Restore the backup, specifying the ID of the backup you wish to restore, omitting
_gitlab_backup.tar
from the name:# This command will overwrite the contents of your GitLab database! sudo gitlab-backup restore BACKUP=11493107454_2018_04_25_10.6.4-ce SKIP=db
If there’s a GitLab version mismatch between your backup tar file and the installed version of GitLab, the restore command aborts with an error message. Install the correct GitLab version, and then try again.
-
Restart and check GitLab:
-
In all Puma or Sidekiq nodes, run:
sudo gitlab-ctl restart
-
In one Puma or Sidekiq node, run:
sudo gitlab-rake gitlab:check SANITIZE=true
-
-
Check that database values can be decrypted especially if
/etc/gitlab/gitlab-secrets.json
was restored, or if a different server is the target for the restore:In a Puma or Sidekiq node, run:
sudo gitlab-rake gitlab:doctor:secrets
-
For added assurance, you can perform an integrity check on the uploaded files:
In a Puma or Sidekiq node, run:
sudo gitlab-rake gitlab:artifacts:check sudo gitlab-rake gitlab:lfs:check sudo gitlab-rake gitlab:uploads:check
If missing or corrupted files are found, it does not always mean the backup and restore process failed. For example, the files might be missing or corrupted on the source GitLab instance. You might need to cross-reference prior backups. If you are migrating GitLab to a new environment, you can run the same checks on the source GitLab instance to determine whether the integrity check result is preexisting or related to the backup and restore process.
- SSH into a toolbox pod.
- In your backup bucket, choose a
*_gitlab_backup.tar
file based on its timestamp, aligned with the PostgreSQL and object storage data that you restored. - Download the
tar
file in/var/opt/gitlab/backups/
. -
Restore the backup, specifying the ID of the backup you wish to restore, omitting
_gitlab_backup.tar
from the name:# This command will overwrite the contents of Gitaly! kubectl exec <Toolbox pod name> -it -- backup-utility --restore -t 11493107454_2018_04_25_10.6.4-ce --skip db,builds,pages,registry,uploads,artifacts,lfs,packages,external_diffs,terraform_state,pages,ci_secure_files
If there’s a GitLab version mismatch between your backup tar file and the installed version of GitLab, the restore command aborts with an error message. Install the correct GitLab version, and then try again.
-
Restart and check GitLab:
-
Start the stopped deployments, using the number of replicas noted in Prerequisites:
kubectl scale deploy -lapp=sidekiq,release=<helm release name> -n <namespace> --replicas=<original value> kubectl scale deploy -lapp=webservice,release=<helm release name> -n <namespace> --replicas=<original value> kubectl scale deploy -lapp=prometheus,release=<helm release name> -n <namespace> --replicas=<original value>
-
In the Toolbox pod, run:
sudo gitlab-rake gitlab:check SANITIZE=true
-
-
Check that database values can be decrypted especially if
/etc/gitlab/gitlab-secrets.json
was restored, or if a different server is the target for the restore:In the Toolbox pod, run:
sudo gitlab-rake gitlab:doctor:secrets
-
For added assurance, you can perform an integrity check on the uploaded files:
Since these commands can take a long time because they iterate over all rows, run the following commands the GitLab Rails node, rather than a Toolbox pod:
sudo gitlab-rake gitlab:artifacts:check sudo gitlab-rake gitlab:lfs:check sudo gitlab-rake gitlab:uploads:check
If missing or corrupted files are found, it does not always mean the backup and restore process failed. For example, the files might be missing or corrupted on the source GitLab instance. You might need to cross-reference prior backups. If you are migrating GitLab to a new environment, you can run the same checks on the source GitLab instance to determine whether the integrity check result is preexisting or related to the backup and restore process.
The restoration should be complete.