Geo data types support
A Geo data type is a specific class of data that is required by one or more GitLab features to store relevant information.
To replicate data produced by these features with Geo, we use several strategies to access, transfer, and verify them.
Data types
We currently distinguish between three different data types:
See the list below of each feature or component we replicate, its corresponding data type, replication, and verification methods:
Type | Feature / component | Replication method | Verification method |
---|---|---|---|
Database | Application data in PostgreSQL | Native | Native |
Database | Redis | N/A (1) | N/A |
Database | Elasticsearch | Native | Native |
Database | SSH public keys | PostgreSQL Replication | PostgreSQL Replication |
Git | Project repository | Geo with Gitaly | Gitaly Checksum |
Git | Project wiki repository | Geo with Gitaly | Gitaly Checksum |
Git | Project designs repository | Geo with Gitaly | Gitaly Checksum |
Git | Object pools for forked project deduplication | Geo with Gitaly | Not implemented |
Git | Project Snippets | Geo with Gitaly | Gitaly Checksum |
Git | Personal Snippets | Geo with Gitaly | Gitaly Checksum |
Git | Group wiki repository | Geo with Gitaly | Not implemented |
Blobs | User uploads (file system) | Geo with API | Not implemented |
Blobs | User uploads (object storage) | Geo with API/Managed (2) | Not implemented |
Blobs | LFS objects (file system) | Geo with API | Not implemented |
Blobs | LFS objects (object storage) | Geo with API/Managed (2) | Not implemented |
Blobs | CI job artifacts (file system) | Geo with API | Not implemented |
Blobs | CI job artifacts (object storage) | Geo with API/Managed (2) | Not implemented |
Blobs | Archived CI build traces (file system) | Geo with API | Not implemented |
Blobs | Archived CI build traces (object storage) | Geo with API/Managed (2) | Not implemented |
Blobs | Container registry (file system) | Geo with API/Docker API | Not implemented |
Blobs | Container registry (object storage) | Geo with API/Managed/Docker API (2) | Not implemented |
Blobs | Package registry (file system) | Geo with API | SHA256 checksum |
Blobs | Package registry (object storage) | Geo with API/Managed (2) | Not implemented |
Blobs | Versioned Terraform State (file system) | Geo with API | SHA256 checksum |
Blobs | Versioned Terraform State (object storage) | Geo with API/Managed (2) | Not implemented |
Blobs | External Merge Request Diffs (file system) | Geo with API | Not implemented |
Blobs | External Merge Request Diffs (object storage) | Geo with API/Managed (2) | Not implemented |
Blobs | Pipeline artifacts (file system) | Geo with API | SHA256 checksum |
Blobs | Pipeline artifacts (object storage) | Geo with API/Managed (2) | SHA256 checksum |
- (1): Redis replication can be used as part of HA with Redis sentinel. It’s not used between Geo sites.
- (2): Object storage replication can be performed by Geo or by your object storage provider/appliance native replication feature.
Git repositories
A GitLab instance can have one or more repository shards. Each shard has a Gitaly instance that is responsible for allowing access and operations on the locally stored Git repositories. It can run on a machine with a single disk, multiple disks mounted as a single mount-point (like with a RAID array), or using LVM.
It requires no special file system and can work with NFS or a mounted Storage Appliance (there may be performance limitations when using a remote file system).
Communication is done via Gitaly’s own gRPC API. There are three possible ways of synchronization:
- Using regular Git clone/fetch from one Geo site to another (with special authentication).
- Using repository snapshots (for when the first method fails or repository is corrupt).
- Manual trigger from the Admin Area (a combination of both of the above).
Each project can have at most 3 different repositories:
- A project repository, where the source code is stored.
- A wiki repository, where the wiki content is stored.
- A design repository, where design artifacts are indexed (assets are actually in LFS).
They all live in the same shard and share the same base name with a -wiki
and -design
suffix
for Wiki and Design Repository cases.
Besides that, there are snippet repositories. They can be connected to a project or to some specific user. Both types will be synced to a secondary site.
Blobs
GitLab stores files and blobs such as Issue attachments or LFS objects into either:
- The file system in a specific location.
- An Object Storage solution. Object Storage solutions can be:
- Cloud based like Amazon S3 Google Cloud Storage.
- Hosted by you (like MinIO).
- A Storage Appliance that exposes an Object Storage-compatible API.
When using the file system store instead of Object Storage, you need to use network mounted file systems to run GitLab when using more than one node.
With respect to replication and verification:
- We transfer files and blobs using an internal API request.
- With Object Storage, you can either:
- Use a cloud provider replication functionality.
- Have GitLab replicate it for you.
Database
GitLab relies on data stored in multiple databases, for different use-cases. PostgreSQL is the single point of truth for user-generated content in the Web interface, like issues content, comments as well as permissions and credentials.
PostgreSQL can also hold some level of cached data like HTML rendered Markdown, cached merge-requests diff (this can also be configured to be offloaded to object storage).
We use PostgreSQL’s own replication functionality to replicate data from the primary to secondary sites.
We use Redis both as a cache store and to hold persistent data for our background jobs system. Because both use-cases have data that are exclusive to the same Geo site, we don’t replicate it between sites.
Elasticsearch is an optional database, that can enable advanced searching capabilities, like improved Advanced Search in both source-code level and user generated content in Issues / Merge-Requests and discussions. Currently it’s not supported in Geo.
Limitations on replication/verification
The following table lists the GitLab features along with their replication and verification status on a secondary site.
You can keep track of the progress to implement the missing items in these epics/issues:
- Geo: Build a scalable, self-service Geo replication and verification framework
- Geo: Improve the self-service Geo replication framework
- Geo: Move existing blobs to framework
- Geo: Add unreplicated data types
- Geo: Support GitLab Pages
Replicated data types behind a feature flag
The replication for some data types is behind a corresponding feature flag:
- They’re deployed behind a feature flag, enabled by default.
- They’re enabled on GitLab.com.
- They can’t be enabled or disabled per-project.
- They are recommended for production use.
- For GitLab self-managed instances, GitLab administrators can opt to disable them.
Enable or disable replication (for some data types)
Replication for some data types are released behind feature flags that are enabled by default. GitLab administrators with access to the GitLab Rails console can opt to disable it for your instance. You can find feature flag names of each of those data types in the notes column of the table below.
To disable, such as for package file replication:
Feature.disable(:geo_package_file_replication)
To enable, such as for package file replication:
Feature.enable(:geo_package_file_replication)
Feature | Replicated (added in GitLab version) | Verified (added in GitLab version) | Object Storage replication (see Geo with Object Storage) | Notes |
---|---|---|---|---|
Application data in PostgreSQL | Yes (10.2) | Yes (10.2) | No | |
Project repository | Yes (10.2) | Yes (10.7) | No | |
Project wiki repository | Yes (10.2) | Yes (10.7) | No | |
Group wiki repository | Yes (13.10) | No | No | Behind feature flag geo_group_wiki_repository_replication , enabled by default.
|
Uploads | Yes (10.2) | No | No | Verified only on transfer or manually using Integrity Check Rake Task on both sites and comparing the output between them. |
LFS objects | Yes (10.2) | No | Via Object Storage provider if supported. Native Geo support (Beta). | Verified only on transfer or manually using Integrity Check Rake Task on both sites and comparing the output between them. GitLab versions 11.11.x and 12.0.x are affected by a bug that prevents any new LFS objects from replicating. Behind feature flag geo_lfs_object_replication , enabled by default.
|
Personal snippets | Yes (10.2) | Yes (10.2) | No | |
Project snippets | Yes (10.2) | Yes (10.2) | No | |
CI job artifacts (other than Job Logs) | Yes (10.4) | No | Via Object Storage provider if supported. Native Geo support (Beta). | Verified only manually using Integrity Check Rake Task on both sites and comparing the output between them. |
CI Pipeline Artifacts | Yes (13.11) | Yes (13.11) | Via Object Storage provider if supported. Native Geo support (Beta). | Persists additional artifacts after a pipeline completes |
Job logs | Yes (10.4) | No | Via Object Storage provider if supported. Native Geo support (Beta). | Verified only on transfer or manually using Integrity Check Rake Task on both sites and comparing the output between them. |
Object pools for forked project deduplication | Yes | No | No | |
Container Registry | Yes (12.3) | No | No | Disabled by default. See instructions to enable. |
Content in object storage (beta) | Yes (12.4) | No | No | |
Project designs repository | Yes (12.7) | No | No | Designs also require replication of LFS objects and Uploads. |
Package Registry for npm | Yes (13.2) | Yes (13.10) | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag geo_package_file_replication , enabled by default.
|
Package Registry for Maven | Yes (13.2) | Yes (13.10) | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag geo_package_file_replication , enabled by default.
|
Package Registry for Conan | Yes (13.2) | Yes (13.10) | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag geo_package_file_replication , enabled by default.
|
Package Registry for NuGet | Yes (13.2) | Yes (13.10) | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag geo_package_file_replication , enabled by default.
|
Package Registry for PyPI | Yes (13.2) | Yes (13.10) | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag geo_package_file_replication , enabled by default.
|
Package Registry for Composer | Yes (13.2) | Yes (13.10) | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag geo_package_file_replication , enabled by default.
|
Package Registry for generic packages | Yes (13.5) | Yes (13.10) | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag geo_package_file_replication , enabled by default.
|
Versioned Terraform State | Yes (13.5) | Yes (13.12) | Via Object Storage provider if supported. Native Geo support (Beta). | Replication is behind the feature flag geo_terraform_state_version_replication , enabled by default. Verification was behind the feature flag geo_terraform_state_version_verification , which was removed in 14.0
|
External merge request diffs | Yes (13.5) | No | Via Object Storage provider if supported. Native Geo support (Beta). | Replication is behind the feature flag geo_merge_request_diff_replication , enabled by default. Verification is under development, behind the feature flag geo_merge_request_diff_verification , introduced in 14.0.
|
Versioned snippets | Yes (13.7) | No | No | |
Server-side Git hooks | No | No | No | |
Elasticsearch integration | No | No | No | |
GitLab Pages | No | No | Via Object Storage provider if supported. No native Geo support (Beta). | |
Dependency proxy images | No | No | No | Blocked on Geo: Secondary Mimicry. Note that replication of this cache is not needed for Disaster Recovery purposes because it can be recreated from external sources. |
Vulnerability Export | Not planned | No | Not planned because they are ephemeral and sensitive. They can be regenerated on demand. |
Limitation of verification for files in Object Storage
GitLab managed Object Storage replication support is in beta.
Locally stored files are verified but remote stored files are not.