Linux 备忘单

这是极狐GitLab 支持团队关于 Linux 的信息集合,他们有时在排查问题时会使用这些信息。它在这里列出是为了透明化,并供有 Linux 经验的用户使用。如果您当前遇到极狐GitLab 的问题,建议您首先查看您的支持选项,然后再尝试使用这些信息。

caution
这超出了极狐GitLab 支持在系统管理方面的帮助范围。极狐GitLab 管理员应该了解其所选发行版的这些命令。如果您是极狐GitLab 支持工程师,请将此作为交叉参考,以便将 yum 转换为 apt-get 等等。

下面的大多数命令都没有标明它们适用于哪个发行版。欢迎贡献以帮助添加这些信息。

系统命令

发行信息

# Debian/Ubuntu
uname -a
lsb_release -a

# CentOS/RedHat
cat /etc/centos-release
cat /etc/redhat-release

# This will provide a lot more information
cat /etc/os-release

关机或重启

shutdown -h now
reboot

权限

# change the user:group ownership of a file/dir
chown root:git <file_or_dir>

# make a file executable
chmod u+x <file>

文件和目录

# create a new directory and all subdirectories
mkdir -p dir/dir2/dir3

# Send a command's output to file.txt, no STDOUT
ls > file.txt

# Send a command's output to file.txt AND see it in STDOUT
ls | tee /tmp/file.txt

# Search and Replace within a file
sed -i 's/original-text/new-text/g' <filename>

查看所有已设置的环境变量

env

搜索

文件名

# search for a file in a filesystem
find . -name 'filename.rb' -print

# locate a file
locate <filename>

# see command history
history

# search CLI history
<ctrl>-R

文件内容

# -B/A = show 2 lines before/after search_term
grep -B 2 -A 2 search_term <filename>

# -<number> shows both before and after
grep -2 search_term <filename>

# Search on all files in directory (recursively)
grep -r search_term <directory>

# Grep namespace/project/name of a GitLab repository
grep 'fullpath' /var/opt/gitlab/git-data/repositories/@hashed/<repo hash>/.git/config

# search through *.gz files is the same except with zgrep
zgrep search_term <filename>

# Fast grep printing lines containing a string pattern
fgrep -R string_pattern <filename or directory>

CLI

# View command history
history

# Run last command that started with 'his' (3 letters min)
!his

# Search through command history
<ctrl>-R

# Execute last command with sudo
sudo !!

资源管理

内存、磁盘和 CPU 使用

# disk space info. The '-h' gives the data in human-readable values
df -h

# size of each file/dir and its contents in the current dir
du -hd 1

# or alternative
du -h --max-depth=1

# find files greater than certain size(k, M, G) and list them in order
# get rid of the + for exact, - for less than
find / -type f -size +100M -print0 | xargs -0 du -hs | sort -h

# Find free memory on a system
free -m

# Find what processes are using memory/CPU and organize by it
# Load average is 1/CPU for 1, 5, and 15 minutes
top -o %MEM
top -o %CPU

Strace

# strace a process
strace -tt -T -f -y -yy -s 1024 -p <pid>

# -tt   print timestamps with microsecond accuracy

# -T    print the time spent in each syscall

# -f    also trace any child processes that forked

# -y    print the path associated with file handles

# -yy    print socket and device file handle details

# -s    max string length to print for an event

# -o    output file

# run strace on all puma processes
ps auwx | grep puma | awk '{ print " -p " $2}' | xargs strace -tt -T -f -y -yy -s 1024 -o /tmp/puma.txt

请注意,strace 在运行时可能会对系统性能产生重大影响。

Strace Parser 工具

我们的 strace-parser 工具可用于提供 strace 输出的高级摘要。它类似于 strace -C,但提供了更详细的统计信息。

MacOS 和 Linux 二进制文件可用,或者如果您有 Rust 编译器,也可以从源代码构建。

如何使用该工具

首先使用 summary 标志运行工具,以获得按任务活动时间排序的顶级进程摘要。您还可以使用 -s--sort 标志基于总时间、系统调用次数、PID 号和子进程数量进行排序。结果数量默认为 25 个进程,但可以使用 -c/--count 选项更改。有关完整详细信息,请参阅 --help

$ ./strace-parser sidekiq_trace.txt summary -c15 -s=pid

Top 15 PIDs by PID #
-----------

  pid         actv (ms)     wait (ms)     user (ms)    total (ms)    % of actv     syscalls     children
  -------    ----------    ----------    ----------    ----------    ---------    ---------    ---------
  16706           0.000         0.000         0.000         0.000        0.00%            0            0
  16708           0.000         0.000         0.000         0.000        0.00%            0            0
  16716           0.000         0.000         0.000         0.000        0.00%            0            0
  16717           0.000         0.000         0.000         0.000        0.00%            0            0
  16718           0.000         0.000         0.000         0.000        0.00%            0            0
  16719           0.000         0.000         0.000         0.000        0.00%            0            0
  16720           0.389      9796.434         1.090      9797.912        0.02%           16            0
  16721           0.000         0.000         0.000         0.000        0.00%            0            0
  16722           0.000         0.000         0.000         0.000        0.00%            0            0
  16723           0.000         0.000         0.000         0.000        0.00%            0            0
  16804           0.218     11099.535         1.881     11101.634        0.01%           36            0
  16813           0.000         0.000         0.000         0.000        0.00%            0            0
  16814           1.740     11825.640         4.616     11831.996        0.10%           57            0
  16815           2.364     12039.993         7.669     12050.026        0.14%           80            0
  16816           0.000         0.000         0.000         0.000        0.00%            0            0

PIDs   93
real   0m12.287s
user   0m1.474s
sys    0m1.686s

根据摘要,您可以使用 -p/--pid 查看一个或多个进程的系统调用详细信息,或使用 -s/--stats 标志查看排序列表。--stats 接受与摘要相同的排序和计数选项。

./strace-parser sidekiq_trace.txt p 16815

PID 16815

  80 syscalls, active time: 2.364ms, user time: 7.669ms, total time: 12050.026ms
  start time: 22:46:14.830267    end time: 22:46:26.880293

  syscall                 count    total (ms)      max (ms)      avg (ms)      min (ms)    errors
  -----------------    --------    ----------    ----------    ----------    ----------    --------
  futex                       5     10100.229      5400.106      2020.046         0.022    ETIMEDOUT: 2
  restart_syscall             1      1939.764      1939.764      1939.764      1939.764    ETIMEDOUT: 1
  getpid                     33         1.020         0.046         0.031         0.018
  clock_gettime              14         0.420         0.038         0.030         0.021
  stat                        6         0.277         0.072         0.046         0.031
  read                        6         0.170         0.036         0.028         0.020
  openat                      3         0.126         0.045         0.042         0.038
  close                       3         0.099         0.034         0.033         0.031
  lseek                       3         0.089         0.035         0.030         0.021
  ioctl                       3         0.082         0.033         0.027         0.023    ENOTTY: 3
  fstat                       3         0.081         0.034         0.027         0.022
  ---------------

  Slowest file open times for PID 16815:

    dur (ms)       timestamp            error         filename
  ----------    ---------------    ---------------    ---------
       0.045    22:46:16.771318           -           /opt/gitlab/embedded/service/gitlab-rails/config/database.yml
       0.043    22:46:26.877954           -           /opt/gitlab/embedded/service/gitlab-rails/config/database.yml
       0.038    22:46:22.174610           -           /opt/gitlab/embedded/service/gitlab-rails/config/database.yml

在上面的示例中,我们可以看到 PID 16815 的哪个文件打开时间较长。

当结果中没有明显的内容时,获取更多上下文的一个好方法是,在执行客户执行的操作时,在您自己的极狐GitLab 实例上运行 strace,然后比较两个结果的摘要,并深入研究差异。

open 系统调用的统计信息

在各种配置下对 openopenat(用于访问文件)的调用的粗略数字。慢速存储可能导致 Gitaly 中可怕的 DeadlineExceeded 错误。

另请参阅手册中的此条目,以获取客户可以执行的快速测试来检查其文件系统性能。

请记住,来自 strace 的计时信息通常不太准确,因此不应将小差异视为重要。

Setup access times
EFS 10 - 30 ms
Local Storage 0.01 - 1 ms

网络

端口

# Find the programs that are listening on ports
netstat -plnt
ss -plnt
lsof -i -P | grep <port>

互联网/DNS

# Show domain IP address
dig +short example.com
nslookup example.com

# Check DNS using specific nameserver
# 8.8.8.8 = google, 1.1.1.1 = cloudflare, 208.67.222.222 = opendns
dig @8.8.8.8 example.com
nslookup example.com 1.1.1.1

# Find host provider
whois <ip_address> | grep -i "orgname\|netname"

# Curl headers with redirect
curl --head --location "https://example.com"

# Test if a host is reachable on the network. `ping6` works on IPv6 networks.
ping example.com

# Show the route taken to a host. `traceroute6` works on IPv6 networks.
traceroute example.com
mtr example.com

# List details of network interfaces
ip address

# Check local DNS settings
cat /etc/hosts
cat /etc/resolv.conf
systemd-resolve --status

# Capture traffic to/from a host
sudo tcpdump host www.example.com

软件包管理

# Debian/Ubuntu

# List packages
dpkg -l
apt list --installed

# Find an installed package
dpkg -l | grep <package>
apt list --installed | grep <package>

# Install a package
dpkg -i <package_name>.deb
apt-get install <package>
apt install <package>

# CentOS/RedHat

# Install a package
yum install <package>
dnf install <package> # RHEL/CentOS 8+

rpm -ivh <package_name>.rpm

# Find an installed package
rpm -qa | grep <package>

日志

```shell # Print last lines in log file where ‘n’ # is the number of lines to print tail -n /path/to/log/file