极狐GitLab Runner 监控

您可以使用 Prometheus 监控极狐GitLab Runner。

嵌入式 Prometheus 指标

带有 Prometheus 指标的嵌入式 HTTP 统计服务器引入于极狐GitLab Runner 1.8.0。

极狐GitLab Runner 使用原生 Prometheus 指标进行检测，可以通过 /metrics 路径上的嵌入式 HTTP 服务器进行公开。如果启用了服务器，它可以被 Prometheus 监控系统抓取或通过其他 HTTP 客户端访问。

公开的信息包括：

Runner 业务逻辑指标（例如，当前正在运行的作业数量）。
特定的 Go 进程指标（垃圾收集统计信息、goroutines、memstats 等）。
一般进程指标（内存使用情况、CPU 使用情况、文件描述符使用情况等）。
构建版本信息。

Prometheus 的展示格式中记载了指标格式。

这些指标可以让运营商监控和了解您的 Runner。例如，您可能会感兴趣 Runner 主机上的平均负载的增加是否与处理作业的增加有关。或者您正在运行一组机器，想跟踪构建趋势以对基础架构进行更改。

深入了解 Prometheus

有关如何设置 Prometheus 服务器抓取 HTTP 端点和利用收集指标，请参见 Prometheus 的开始指南；有关如何配置 Prometheus，请参见配置；有关如何发送警报通知，请参见警报规则和设置 Alertmanager。

可用指标

查看所有可用指标的完整列表，在配置和启用之后 curl 指标端点。例如，对于配置了 9252 监听端口的本地 Runner：

$ curl -s "http://localhost:9252/metrics" | grep -E "# HELP"

# HELP gitlab_runner_api_request_statuses_total The total number of api requests, partitioned by runner, endpoint and status.
# HELP gitlab_runner_autoscaling_machine_creation_duration_seconds Histogram of machine creation time.
# HELP gitlab_runner_autoscaling_machine_states The current number of machines per state in this provider.
# HELP gitlab_runner_concurrent The current value of concurrent setting
# HELP gitlab_runner_errors_total The number of caught errors.
# HELP gitlab_runner_limit The current value of limit setting
# HELP gitlab_runner_request_concurrency The current number of concurrent requests for a new job
# HELP gitlab_runner_request_concurrency_exceeded_total Count of excess requests above the configured request_concurrency limit
# HELP gitlab_runner_version_info A metric with a constant '1' value labeled by different build stats fields.
...

有关可用指标的完整列表，请参阅监控 Runner。

`pprof` HTTP 端点

pprof 集成引入于极狐GitLab Runner 1.9.0。

尽管有关极狐GitLab Runner 进程内部状态的指标很有用，但我们发现在某些情况下，实时检查 Runner 进程内部更有效果。因此我们引入了 pprof HTTP 端点。

您可以通过 /debug/pprof/ 路径上的嵌入式 HTTP 服务器使用 pprof 端点。

您可以在其文档中阅读有关使用 pprof 的更多信息。

指标 HTTP 服务器配置

指标服务器导出极狐GitLab Runner 进程内部的状态数据，且不应公开。

指标 HTTP 服务器可以通过以下几种方式进行配置：

在 config.toml 文件中使用 listen_address 全局配置选项。
使用 run 命令的 --listen-address 命令行选项。
对于 Helm chart 安装的 runner，在 values.yaml 文件中：
1. 配置 metrics 选项： ```yaml ## Configure integrated Prometheus metrics exporter ## ## ref: https://docs.gitlab.com/runner/monitoring/#configuration-of-the-metrics-http-server ## metrics: enabled: true
  
  ## Define a name for the metrics port ## portName: metrics
  
  ## Provide a port number for the integrated Prometheus metrics exporter ## port: 9252
  
  ## Configure a prometheus-operator serviceMonitor to allow autodetection of ## the scraping target. Requires enabling the service resource below. ## serviceMonitor: enabled: true
```
... ```
```
2. 配置 service 监控来获取配置的 metrics：
```
## Configure a service resource to allow scraping metrics by uisng
## prometheus-operator serviceMonitor
service:
  enabled: true

  ## Provide additonal labels for the service
  ##
  labels: {}

  ## Provide additonal annotations for the service
  ##
  annotations: {}

  ...
```

如果将地址添加到 config.toml 文件中，以启动指标 HTTP 服务器，您必须重新启动 Runner 进程。

在这两种情况下，该选项都接受格式为 [host]:<port> 的字符串，其中：

host 可以是 IP 地址或主机名。
port 是有效的 TCP 端口或符号服务名称（如 http）。我们建议您使用已经在 Prometheus 中分配的端口 9252。

如果监听地址不包含端口，则默认为 9252。

地址示例：

:9252：将侦听端口 9252 上所有接口的所有 IP。
localhost:9252：仅监听端口 9252 上的环回接口。
[2001:db8::1]:http：将侦听 HTTP 端口 80 上的 IPv6 地址 [2001:db8::1]。

请记住，监听低于 1024 的端口 - 至少在 Linux/Unix 系统上 - 您需要拥有 root/管理员权限。

HTTP 服务器未经任何授权在选定的 host:port 上启动。如果您打算将指标服务器绑定到公共接口，您应该考虑使用防火墙限制对该服务器的访问或添加 HTTP 代理，并且该代理将添加授权和访问控制层。

极狐GitLab Runner 监控

嵌入式 Prometheus 指标

深入了解 Prometheus

可用指标

pprof HTTP 端点

指标 HTTP 服务器配置

`pprof` HTTP 端点