Supported GitLab Duo Self-Hosted models and hardware requirements

Tier: Ultimate with GitLab Duo Enterprise - Start a trial Offering: GitLab Self-Managed

History

Introduced in GitLab 17.1 with a flag named ai_custom_model. Disabled by default.
Enabled on GitLab Self-Managed in GitLab 17.6.
Changed to require GitLab Duo add-on in GitLab 17.6 and later.
Feature flag ai_custom_model removed in GitLab 17.8
Generally available in GitLab 17.9

The following table shows the supported models along with their specific features and hardware requirements to help you select the model that best fits your infrastructure needs for optimal performance.

Supported models

The following GitLab-supported large language models (LLMs) are generally available.

Fully compatible: The model can likely handle the feature without any loss of quality.
Largely compatible: The model supports the feature, but there might be compromises or limitations.
Not compatible: The model is unsuitable for the feature, likely resulting in significant quality loss or performance issues.

Model Family	Model	Supported Platforms	Code completion	Code generation	GitLab Duo Chat
Mistral Codestral	Codestral 22B v0.1	vLLM	Fully compatible	Fully compatible	N/A
Mistral	Mistral 7B-it v0.3	vLLM	Fully compatible	Fully compatible	Not compatible
Mistral	Mixtral 8x7B-it v0.1	vLLM, AWS Bedrock	Fully compatible	Fully compatible	Limited compatibility
Mistral	Mixtral 8x22B-it v0.1	vLLM	Fully compatible	Fully compatible	Limited compatibility
Claude 3	Claude 3.5 Sonnet	AWS Bedrock	Fully compatible	Fully compatible	Fully compatible
GPT	GPT-4 Turbo	Azure OpenAI	Fully compatible	Fully compatible	Limited compatibility
GPT	GPT-4o	Azure OpenAI	Fully compatible	Fully compatible	Fully compatible
GPT	GPT-4o-mini	Azure OpenAI	Fully compatible	Fully compatible	Limited compatibility

Experimental and beta models

The following models are configurable for the functionalities marked below, but are currently in experimental or beta status, under evaluation, and are excluded from the “Customer Integrated Models” definition in the AI Functionality Terms:

Model family	Model	Supported platforms	Status	Code completion	Code generation	GitLab Duo Chat
CodeGemma	CodeGemma 2b	vLLM	Beta	Yes	No	No
CodeGemma	CodeGemma 7b-it	vLLM	Beta	No	Yes	No
CodeGemma	CodeGemma 7b-code	vLLM	Beta	Yes	No	No
Code Llama	Code-Llama 13b	vLLM	Beta	No	Yes	No
DeepSeek Coder	DeepSeek Coder 33b Instruct	vLLM	Beta	Yes	Yes	No
DeepSeek Coder	DeepSeek Coder 33b Base	vLLM	Beta	Yes	No	No
Mistral	Mistral 7B-it v0.2	vLLM AWS Bedrock	Beta	Yes	Yes	Yes

Hardware requirements

The following hardware specifications are the minimum requirements for running GitLab Duo Self-Hosted on-premise. Requirements vary significantly based on the model size and intended usage:

Base system requirements

CPU:
- Minimum: 8 cores (16 threads)
- Recommended: 16+ cores for production environments
RAM:
- Minimum: 32 GB
- Recommended: 64 GB for most models
Storage:
- SSD with sufficient space for model weights and data.

GPU requirements by model size

Model size	Minimum GPU configuration	Minimum VRAM required
7B models (for example, Mistral 7B)	1x NVIDIA A100 (40GB)	35 GB
22B models (for example, Codestral 22B)	2x NVIDIA A100 (80GB)	110 GB
Mixtral 8x7B	2x NVIDIA A100 (80GB)	220 GB
Mixtral 8x22B	8x NVIDIA A100 (80GB)	526 GB

Use Hugging Face’s memory utility to verify memory requirements.

Response time by model size and GPU

Small machine

With a a2-highgpu-2g (2x Nvidia A100 40 GB - 150 GB vRAM) or equivalent:

Model name	Number of requests	Average time per request (sec)	Average tokens in response	Average tokens per second per request	Total time for requests	Total TPS
Mistral-7B-Instruct-v0.3	1	7.09	717.0	101.19	7.09	101.17
Mistral-7B-Instruct-v0.3	10	8.41	764.2	90.35	13.70	557.80
Mistral-7B-Instruct-v0.3	100	13.97	693.23	49.17	20.81	3331.59

Medium machine

With a a2-ultragpu-4g (4x Nvidia A100 40 GB - 340 GB vRAM) machine on GCP or equivalent:

Model name	Number of requests	Average time per request (sec)	Average tokens in response	Average tokens per second per request	Total time for requests	Total TPS
Mistral-7B-Instruct-v0.3	1	3.80	499.0	131.25	3.80	131.23
Mistral-7B-Instruct-v0.3	10	6.00	740.6	122.85	8.19	904.22
Mistral-7B-Instruct-v0.3	100	11.71	695.71	59.06	15.54	4477.34
Mixtral-8x7B-Instruct-v0.1	1	6.50	400.0	61.55	6.50	61.53
Mixtral-8x7B-Instruct-v0.1	10	16.58	768.9	40.33	32.56	236.13
Mixtral-8x7B-Instruct-v0.1	100	25.90	767.38	26.87	55.57	1380.68

Large machine

With a a2-ultragpu-8g (8 x NVIDIA A100 80 GB - 1360 GB vRAM) machine on GCP or equivalent:

Model name	Number of requests	Average time per request (sec)	Average tokens in response	Average tokens per second per request	Total time for requests (sec)	Total TPS
Mistral-7B-Instruct-v0.3	1	3.23	479.0	148.41	3.22	148.36
Mistral-7B-Instruct-v0.3	10	4.95	678.3	135.98	6.85	989.11
Mistral-7B-Instruct-v0.3	100	10.14	713.27	69.63	13.96	5108.75
Mixtral-8x7B-Instruct-v0.1	1	6.08	709.0	116.69	6.07	116.64
Mixtral-8x7B-Instruct-v0.1	10	9.95	645.0	63.68	13.40	481.06
Mixtral-8x7B-Instruct-v0.1	100	13.83	585.01	41.80	20.38	2869.12
Mixtral-8x22B-Instruct-v0.1	1	14.39	828.0	57.56	14.38	57.55
Mixtral-8x22B-Instruct-v0.1	10	20.57	629.7	30.24	28.02	224.71
Mixtral-8x22B-Instruct-v0.1	100	27.58	592.49	21.34	36.80	1609.85

AI Gateway Hardware Requirements

For recommendations on AI gateway hardware, see the AI gateway scaling recommendations.

Supported GitLab Duo Self-Hosted models and hardware requirements

Supported models

Experimental and beta models

Hardware requirements

Base system requirements

GPU requirements by model size

Response time by model size and GPU

Small machine

Medium machine

Large machine

AI Gateway Hardware Requirements

Help & feedback

Docs

Product

Feature availability and product trials

Get Help