Supported GitLab Duo Self-Hosted models and hardware requirements

Tier: Ultimate with GitLab Duo Enterprise - Start a trial Offering: GitLab Self-Managed
History
  • Introduced in GitLab 17.1 with a flag named ai_custom_model. Disabled by default.
  • Enabled on GitLab Self-Managed in GitLab 17.6.
  • Changed to require GitLab Duo add-on in GitLab 17.6 and later.
  • Feature flag ai_custom_model removed in GitLab 17.8
  • Generally available in GitLab 17.9

The following table shows the supported models along with their specific features and hardware requirements to help you select the model that best fits your infrastructure needs for optimal performance.

Supported models

The following GitLab-supported large language models (LLMs) are generally available.

  • Fully compatible: The model can likely handle the feature without any loss of quality.
  • Largely compatible: The model supports the feature, but there might be compromises or limitations.
  • Not compatible: The model is unsuitable for the feature, likely resulting in significant quality loss or performance issues.
Model Family Model Supported Platforms Code completion Code generation GitLab Duo Chat
Mistral Codestral Codestral 22B v0.1 vLLM Fully compatible Fully compatible N/A
Mistral Mistral 7B-it v0.3 vLLM Fully compatible Fully compatible Not compatible
Mistral Mixtral 8x7B-it v0.1 vLLM, AWS Bedrock Fully compatible Fully compatible Limited compatibility
Mistral Mixtral 8x22B-it v0.1 vLLM Fully compatible Fully compatible Limited compatibility
Claude 3 Claude 3.5 Sonnet AWS Bedrock Fully compatible Fully compatible Fully compatible
GPT GPT-4 Turbo Azure OpenAI Fully compatible Fully compatible Limited compatibility
GPT GPT-4o Azure OpenAI Fully compatible Fully compatible Fully compatible
GPT GPT-4o-mini Azure OpenAI Fully compatible Fully compatible Limited compatibility

Experimental and beta models

The following models are configurable for the functionalities marked below, but are currently in experimental or beta status, under evaluation, and are excluded from the “Customer Integrated Models” definition in the AI Functionality Terms:

Model family Model Supported platforms Status Code completion Code generation GitLab Duo Chat
CodeGemma CodeGemma 2b vLLM Beta Yes No No
CodeGemma CodeGemma 7b-it vLLM Beta No Yes No
CodeGemma CodeGemma 7b-code vLLM Beta Yes No No
Code Llama Code-Llama 13b vLLM Beta No Yes No
DeepSeek Coder DeepSeek Coder 33b Instruct vLLM Beta Yes Yes No
DeepSeek Coder DeepSeek Coder 33b Base vLLM Beta Yes No No
Mistral Mistral 7B-it v0.2 vLLM
AWS Bedrock
Beta Yes Yes Yes

Hardware requirements

The following hardware specifications are the minimum requirements for running GitLab Duo Self-Hosted on-premise. Requirements vary significantly based on the model size and intended usage:

Base system requirements

  • CPU:
    • Minimum: 8 cores (16 threads)
    • Recommended: 16+ cores for production environments
  • RAM:
    • Minimum: 32 GB
    • Recommended: 64 GB for most models
  • Storage:
    • SSD with sufficient space for model weights and data.

GPU requirements by model size

Model size Minimum GPU configuration Minimum VRAM required
7B models
(for example, Mistral 7B)
1x NVIDIA A100 (40GB) 35 GB
22B models
(for example, Codestral 22B)
2x NVIDIA A100 (80GB) 110 GB
Mixtral 8x7B 2x NVIDIA A100 (80GB) 220 GB
Mixtral 8x22B 8x NVIDIA A100 (80GB) 526 GB

Use Hugging Face’s memory utility to verify memory requirements.

Response time by model size and GPU

Small machine

With a a2-highgpu-2g (2x Nvidia A100 40 GB - 150 GB vRAM) or equivalent:

Model name Number of requests Average time per request (sec) Average tokens in response Average tokens per second per request Total time for requests Total TPS
Mistral-7B-Instruct-v0.3 1 7.09 717.0 101.19 7.09 101.17
Mistral-7B-Instruct-v0.3 10 8.41 764.2 90.35 13.70 557.80
Mistral-7B-Instruct-v0.3 100 13.97 693.23 49.17 20.81 3331.59

Medium machine

With a a2-ultragpu-4g (4x Nvidia A100 40 GB - 340 GB vRAM) machine on GCP or equivalent:

Model name Number of requests Average time per request (sec) Average tokens in response Average tokens per second per request Total time for requests Total TPS
Mistral-7B-Instruct-v0.3 1 3.80 499.0 131.25 3.80 131.23
Mistral-7B-Instruct-v0.3 10 6.00 740.6 122.85 8.19 904.22
Mistral-7B-Instruct-v0.3 100 11.71 695.71 59.06 15.54 4477.34
Mixtral-8x7B-Instruct-v0.1 1 6.50 400.0 61.55 6.50 61.53
Mixtral-8x7B-Instruct-v0.1 10 16.58 768.9 40.33 32.56 236.13
Mixtral-8x7B-Instruct-v0.1 100 25.90 767.38 26.87 55.57 1380.68

Large machine

With a a2-ultragpu-8g (8 x NVIDIA A100 80 GB - 1360 GB vRAM) machine on GCP or equivalent:

Model name Number of requests Average time per request (sec) Average tokens in response Average tokens per second per request Total time for requests (sec) Total TPS
Mistral-7B-Instruct-v0.3 1 3.23 479.0 148.41 3.22 148.36
Mistral-7B-Instruct-v0.3 10 4.95 678.3 135.98 6.85 989.11
Mistral-7B-Instruct-v0.3 100 10.14 713.27 69.63 13.96 5108.75
Mixtral-8x7B-Instruct-v0.1 1 6.08 709.0 116.69 6.07 116.64
Mixtral-8x7B-Instruct-v0.1 10 9.95 645.0 63.68 13.40 481.06
Mixtral-8x7B-Instruct-v0.1 100 13.83 585.01 41.80 20.38 2869.12
Mixtral-8x22B-Instruct-v0.1 1 14.39 828.0 57.56 14.38 57.55
Mixtral-8x22B-Instruct-v0.1 10 20.57 629.7 30.24 28.02 224.71
Mixtral-8x22B-Instruct-v0.1 100 27.58 592.49 21.34 36.80 1609.85

AI Gateway Hardware Requirements

For recommendations on AI gateway hardware, see the AI gateway scaling recommendations.