Set up your self-hosted model infrastructure

Tier: Ultimate with GitLab Duo Enterprise - Start a trial Offering: Self-managed Status: Beta
History
The availability of this feature is controlled by a feature flag. For more information, see the history.

By self-hosting the model, AI Gateway, and GitLab instance, there are no calls to external architecture, ensuring maximum levels of security.

To set up your self-hosted model infrastructure:

  1. Install the latest version of GitLab
  2. Install the GitLab AI Gateway.
  3. Install the large language model (LLM) serving infrastructure.
  4. Configure your GitLab instance.

Install large language model serving infrastructure

Install one of the following GitLab-approved LLM models:

Model family Model Code completion Code generation GitLab Duo Chat
Mistral Codestral 22B (see setup instructions) Yes Yes No
Mistral Mistral 7B Yes No No
Mistral Mistral 7B-it Yes Yes Yes
Mistral Mixtral 8x7B Yes No No
Mistral Mixtral 8x7B-it Yes Yes Yes
Mistral Mixtral 8x22B Yes No No
Mistral Mixtral 8x22B-it Yes Yes Yes
Claude 3 Claude 3.5 Sonnet Yes Yes Yes

The following models are under evaluation, and support is limited:

Model family Model Code completion Code generation GitLab Duo Chat
CodeGemma CodeGemma 2b Yes No No
CodeGemma CodeGemma 7b-it (Instruction) No Yes No
CodeGemma CodeGemma 7b-code (Code) Yes No No
CodeLlama Code-Llama 13b-code Yes No No
CodeLlama Code-Llama 13b No Yes No
DeepSeekCoder DeepSeek Coder 33b Instruct Yes Yes No
DeepSeekCoder DeepSeek Coder 33b Base Yes No No
GPT GPT-3.5-Turbo Yes Yes No
GPT GPT-4 Yes Yes No
GPT GPT-4 Turbo Yes Yes No
GPT GPT-4o Yes Yes No
GPT GPT-4o-mini Yes Yes No

Use a serving architecture

To host your models, you should use:

  • For non-cloud on-premise model deployments, vLLM.
  • For cloud-hosted model deployments, we currently support AWS Bedrock or Azure OpenAI as cloud providers.

GitLab AI Gateway

Install the GitLab AI Gateway.

Configure your GitLab instance

Prerequisites:

  • Upgrade to the latest version of GitLab.
  1. The GitLab instance must be able to access the AI Gateway.

    1. Where your GitLab instance is installed, update the /etc/gitlab/gitlab.rb file.

      sudo vim /etc/gitlab/gitlab.rb
      
    2. Add and save the following environment variables.

      gitlab_rails['env'] = {
      'GITLAB_LICENSE_MODE' => 'production',
      'CUSTOMER_PORTAL_URL' => 'https://customers.gitlab.com',
      'AI_GATEWAY_URL' => '<path_to_your_ai_gateway>:<port>'
      }
    3. Run reconfigure:

      sudo gitlab-ctl reconfigure
      

Enable logging

Prerequisites:

  • You must be an administrator for your self-managed instance.

To enable logging and access the logs, enable the feature flag:

Feature.enable(:expanded_ai_logging)

Disabling the feature flag stops logs from being written.

Logs in your GitLab installation

The logging setup is designed to protect sensitive information while maintaining transparency about system operations, and is made up of the following components:

  • Logs that capture requests to the GitLab instance.
  • Feature flag and logging control.
  • The llm.log file.

Logs that capture requests to the GitLab instance

Logging in the application.json, production_json.log, and production.log files, among others, capture requests to the GitLab instance:

  • Filtered Requests: We log the requests in these files but ensure that sensitive data (such as input parameters) is filtered. This means that while the request metadata is captured (for example, the request type, endpoint, and response status), the actual input data (for example, the query parameters, variables, and content) is not logged to prevent the exposure of sensitive information.
  • Example 1: In the case of a code suggestions completion request, the logs capture the request details while filtering sensitive information:

    {
      "method": "POST",
      "path": "/api/graphql",
      "controller": "GraphqlController",
      "action": "execute",
      "status": 500,
      "params": [
        {"key": "query", "value": "[FILTERED]"},
        {"key": "variables", "value": "[FILTERED]"},
        {"key": "operationName", "value": "chat"}
      ],
      "exception": {
        "class": "NoMethodError",
        "message": "undefined method `id` for {:skip=>true}:Hash"
      },
      "time": "2024-08-28T14:13:50.328Z"
    }
    

    As shown, while the error information and general structure of the request are logged, the sensitive input parameters are marked as [FILTERED].

  • Example 2: In the case of a code suggestions completion request, the logs also capture the request details while filtering sensitive information:

    {
      "method": "POST",
      "path": "/api/v4/code_suggestions/completions",
      "status": 200,
      "params": [
        {"key": "prompt_version", "value": 1},
        {"key": "current_file", "value": {"file_name": "/test.rb", "language_identifier": "ruby", "content_above_cursor": "[FILTERED]", "content_below_cursor": "[FILTERED]"}},
        {"key": "telemetry", "value": []}
      ],
      "time": "2024-10-15T06:51:09.004Z"
    }
    

    As shown, while the general structure of the request is logged, the sensitive input parameters such as content_above_cursor and content_below_cursor are marked as [FILTERED].

Feature Flag and Logging Control

Feature Flag Dependency: You can control a subset of these logs by enabling or disabling the expanded_ai_logging feature flag. Disabling the feature flag disables logging for specific operations. For more information, see the Feature Flag section under Privacy Considerations.

The llm.log file

When the :expanded_ai_logging feature flag is enabled, the llm.log file in your GitLab instance captures code generation and Chat events that occur through your instance. The log file does not capture anything when the feature flag is not enabled. Code completion logs are captured directly in the AI Gateway.

For more information on:

Logs in your AI Gateway container

To specify the location of logs generated by AI Gateway, run:

docker run -e AIGW_GITLAB_URL=<your_gitlab_instance> \
 -e AIGW_GITLAB_API_URL=https://<your_gitlab_domain>/api/v4/ \
 -e AIGW_GITLAB_API_URL=https://<your_gitlab_domain>/api/v4/ \
 -e AIGW_LOGGING__TO_FILE="aigateway.log" \
 -v <your_file_path>:"aigateway.log"
 <image>

If you do not specify a filename, logs are streamed to the output and can also be managed using Docker logs. For more information, see the Docker Logs documentation.

Additionally, the outputs of the AI Gateway execution can help with debugging issues. To access them:

  • When using Docker:

    docker logs <container-id>
    
  • When using Kubernetes:

    kubectl logs <container-name>
    

To ingest these logs into the logging solution, see your logging provider documentation.

Logs structure

When a POST request is made (for example, to the /chat/completions endpoint), the server logs the request:

  • Payload
  • Headers
  • Metadata

1. Request payload

The JSON payload typically includes the following fields:

  • messages: An array of message objects.
    • Each message object contains:
      • content: A string representing the user’s input or query.
      • role: Indicates the role of the message sender (for example, user).
  • model: A string specifying the model to be used (for example, mistral).
  • max_tokens: An integer specifying the maximum number of tokens to generate in the response.
  • n: An integer indicating the number of completions to generate.
  • stop: An array of strings denoting stop sequences for the generated text.
  • stream: A boolean indicating whether the response should be streamed.
  • temperature: A float controlling the randomness of the output.
Example request
{
    "messages": [
        {
            "content": "<s>[SUFFIX]None[PREFIX]# # build a hello world ruby method\n def say_goodbye\n    puts \"Goodbye, World!\"\n  end\n\ndef main\n  say_hello\n  say_goodbye\nend\n\nmain",
            "role": "user"
        }
    ],
    "model": "mistral",
    "max_tokens": 128,
    "n": 1,
    "stop": ["[INST]", "[/INST]", "[PREFIX]", "[MIDDLE]", "[SUFFIX]"],
    "stream": false,
    "temperature": 0.0
}

2. Request headers

The request headers provide additional context about the client making the request. Key headers might include:

  • Authorization: Contains the Bearer token for API access.
  • Content-Type: Indicates the media type of the resource (for example, JSON).
  • User-Agent: Information about the client software making the request.
  • X-Stainless- headers: Various headers providing additional metadata about the client environment.
Example request headers
{
    "host": "0.0.0.0:4000",
    "accept-encoding": "gzip, deflate",
    "connection": "keep-alive",
    "accept": "application/json",
    "content-type": "application/json",
    "user-agent": "AsyncOpenAI/Python 1.51.0",
    "authorization": "Bearer <TOKEN>",
    "content-length": "364"
}

3. Request metadata

The metadata includes various fields that describe the context of the request:

  • requester_metadata: Additional metadata about the requester.
  • user_api_key: The API key used for the request (anonymized).
  • api_version: The version of the API being used.
  • request_timeout: The timeout duration for the request.
  • call_id: A unique identifier for the call.
Example metadata
{
    "user_api_key": "<ANONYMIZED_KEY>",
    "api_version": "1.48.18",
    "request_timeout": 600,
    "call_id": "e1aaa316-221c-498c-96ce-5bc1e7cb63af"
}

Example response

The server responds with a structured model response. For example:

Response: ModelResponse(
    id='chatcmpl-5d16ad41-c130-4e33-a71e-1c392741bcb9',
    choices=[
        Choices(
            finish_reason='stop',
            index=0,
            message=Message(
                content=' Here is the corrected Ruby code for your function:\n\n```ruby\ndef say_hello\n  puts "Hello, World!"\nend\n\ndef say_goodbye\n    puts "Goodbye, World!"\nend\n\ndef main\n  say_hello\n  say_goodbye\nend\n\nmain\n```\n\nIn your original code, the method names were misspelled as `say_hell` and `say_gobdye`. I corrected them to `say_hello` and `say_goodbye`, respectively. Also, there was no need for the prefix',
                role='assistant',
                tool_calls=None,
                function_call=None
            )
        )
    ],
    created=1728983827,
    model='mistral',
    object='chat.completion',
    system_fingerprint=None,
    usage=Usage(
        completion_tokens=128,
        prompt_tokens=69,
        total_tokens=197,
        completion_tokens_details=None,
        prompt_tokens_details=None
    )
)

Logs in your inference service provider

GitLab does not manage logs generated by your inference service provider. See the documentation of your inference service provider on how to use their logs.

Logging behavior in GitLab and AI Gateway environments

GitLab provides logging functionality for AI-related activities through the use of llm.log, which captures inputs, outputs, and other relevant information. However, the loggig behavior differs depending on whether the GitLab instance and AI Gateway (AI Gateway) are self-hosted or cloud-connected.

By default, the log does not contain LLM prompt input and response output to support data retention policies of AI feature data.

Logging Scenarios

GitLab self-managed and self-hosted AI Gateway

In this configuration, both GitLab and the AI Gateway are hosted by the customer.

  • Logging Behavior: Full logging is enabled, and all prompts, inputs, and outputs are logged to llm.log on the GitLab self-managed instance.
  • Expanded Logging: When the :expanded_ai_logging feature flag is activated, extra debugging information is logged, including:
    • Preprocessed prompts.
    • Final prompts.
    • Additional context.
  • Privacy: Because both GitLab and AI Gateway are self-hosted:
    • The customer has full control over data handling.
    • Logging of sensitive information can be enabled or disabled at the customer’s discretion.

GitLab self-managed and GitLab-managed AI Gateway (cloud-connected)

In this scenario, the customer hosts GitLab but relies on the GitLab-managed AI Gateway for AI processing.

  • Logging Behavior: Prompts and inputs sent to the AI Gateway are not logged in the cloud-connected AI Gateway to prevent exposure of sensitive information such as personally identifiable information (PII).
  • Expanded Logging: Even if the :expanded_ai_logging feature flag is enabled, no detailed logs are generated in the GitLab-managed AI Gateway to avoid unintended leaks of sensitive information.
    • Logging remains minimal in this setup, and the expanded logging features are disabled by default.
  • Privacy: This configuration is designed to ensure that sensitive data is not logged in a cloud environment.

Feature Flag: :expanded_ai_logging

The :expanded_ai_logging feature flag controls whether additional debugging information, including prompts and inputs, is logged. This flag is essential for monitoring and debugging AI-related activities.

Behavior by Deployment Setup

  • GitLab self-managed and self-hosted AI Gateway: The feature flag enables detailed logging to llm.log on the self-hosted instance, capturing inputs and outputs for AI models.
  • GitLab self-managed and GitLab-managed AI Gateway: The feature flag enables logging on your self-managed instance. However, the flag does not activate expanded logging for the GitLab-managed AI Gateway side. Logging remains disabled for cloud-connected AI Gateway to protect sensitive data. For more information, see the Feature Flag section under Privacy Considerations documentation.

Logging in cloud-connected AI Gateways

To prevent potential data leakage of sensitive information, expanded logging (including prompts and inputs) is intentionally disabled when using a cloud-connected AI Gateway. Preventing the exposure of PII is a priority.

Cross-referencing logs between AI Gateway and GitLab

The property correlation_id is assigned to every request and is carried across different components that respond to a request. For more information, see the documentation on finding logs with a correlation ID.

The Correlation ID can be found in your AI Gateway and GitLab logs. However, it is not present in your model provider logs.

Troubleshooting

First, run the debugging scripts to verify your self-hosted model setup.

For more information on other actions to take, see the troubleshooting documentation.