Linux 备忘单
这是极狐GitLab 支持团队关于 Linux 的信息集合,他们有时在排查问题时会使用这些信息。它在这里列出是为了透明化,并供有 Linux 经验的用户使用。如果您当前遇到极狐GitLab 的问题,建议您首先查看您的支持选项,然后再尝试使用这些信息。
这超出了极狐GitLab 支持在系统管理方面的帮助范围。极狐GitLab 管理员应该了解其所选发行版的这些命令。如果您是极狐GitLab 支持工程师,请将此作为交叉参考,以便将
yum
转换为 apt-get
等等。下面的大多数命令都没有标明它们适用于哪个发行版。欢迎贡献以帮助添加这些信息。
系统命令
发行信息
# Debian/Ubuntu
uname -a
lsb_release -a
# CentOS/RedHat
cat /etc/centos-release
cat /etc/redhat-release
# This will provide a lot more information
cat /etc/os-release
关机或重启
shutdown -h now
reboot
权限
# change the user:group ownership of a file/dir
chown root:git <file_or_dir>
# make a file executable
chmod u+x <file>
文件和目录
# create a new directory and all subdirectories
mkdir -p dir/dir2/dir3
# Send a command's output to file.txt, no STDOUT
ls > file.txt
# Send a command's output to file.txt AND see it in STDOUT
ls | tee /tmp/file.txt
# Search and Replace within a file
sed -i 's/original-text/new-text/g' <filename>
查看所有已设置的环境变量
env
搜索
文件名
# search for a file in a filesystem
find . -name 'filename.rb' -print
# locate a file
locate <filename>
# see command history
history
# search CLI history
<ctrl>-R
文件内容
# -B/A = show 2 lines before/after search_term
grep -B 2 -A 2 search_term <filename>
# -<number> shows both before and after
grep -2 search_term <filename>
# Search on all files in directory (recursively)
grep -r search_term <directory>
# Grep namespace/project/name of a GitLab repository
grep 'fullpath' /var/opt/gitlab/git-data/repositories/@hashed/<repo hash>/.git/config
# search through *.gz files is the same except with zgrep
zgrep search_term <filename>
# Fast grep printing lines containing a string pattern
fgrep -R string_pattern <filename or directory>
CLI
# View command history
history
# Run last command that started with 'his' (3 letters min)
!his
# Search through command history
<ctrl>-R
# Execute last command with sudo
sudo !!
资源管理
内存、磁盘和 CPU 使用
# disk space info. The '-h' gives the data in human-readable values
df -h
# size of each file/dir and its contents in the current dir
du -hd 1
# or alternative
du -h --max-depth=1
# find files greater than certain size(k, M, G) and list them in order
# get rid of the + for exact, - for less than
find / -type f -size +100M -print0 | xargs -0 du -hs | sort -h
# Find free memory on a system
free -m
# Find what processes are using memory/CPU and organize by it
# Load average is 1/CPU for 1, 5, and 15 minutes
top -o %MEM
top -o %CPU
Strace
# strace a process
strace -tt -T -f -y -yy -s 1024 -p <pid>
# -tt print timestamps with microsecond accuracy
# -T print the time spent in each syscall
# -f also trace any child processes that forked
# -y print the path associated with file handles
# -yy print socket and device file handle details
# -s max string length to print for an event
# -o output file
# run strace on all puma processes
ps auwx | grep puma | awk '{ print " -p " $2}' | xargs strace -tt -T -f -y -yy -s 1024 -o /tmp/puma.txt
请注意,strace 在运行时可能会对系统性能产生重大影响。
Strace Parser 工具
我们的 strace-parser 工具可用于提供 strace
输出的高级摘要。它类似于 strace -C
,但提供了更详细的统计信息。
MacOS 和 Linux 二进制文件可用,或者如果您有 Rust 编译器,也可以从源代码构建。
如何使用该工具
首先使用 summary
标志运行工具,以获得按任务活动时间排序的顶级进程摘要。您还可以使用 -s
或 --sort
标志基于总时间、系统调用次数、PID 号和子进程数量进行排序。结果数量默认为 25 个进程,但可以使用 -c
/--count
选项更改。有关完整详细信息,请参阅 --help
。
$ ./strace-parser sidekiq_trace.txt summary -c15 -s=pid
Top 15 PIDs by PID #
-----------
pid actv (ms) wait (ms) user (ms) total (ms) % of actv syscalls children
------- ---------- ---------- ---------- ---------- --------- --------- ---------
16706 0.000 0.000 0.000 0.000 0.00% 0 0
16708 0.000 0.000 0.000 0.000 0.00% 0 0
16716 0.000 0.000 0.000 0.000 0.00% 0 0
16717 0.000 0.000 0.000 0.000 0.00% 0 0
16718 0.000 0.000 0.000 0.000 0.00% 0 0
16719 0.000 0.000 0.000 0.000 0.00% 0 0
16720 0.389 9796.434 1.090 9797.912 0.02% 16 0
16721 0.000 0.000 0.000 0.000 0.00% 0 0
16722 0.000 0.000 0.000 0.000 0.00% 0 0
16723 0.000 0.000 0.000 0.000 0.00% 0 0
16804 0.218 11099.535 1.881 11101.634 0.01% 36 0
16813 0.000 0.000 0.000 0.000 0.00% 0 0
16814 1.740 11825.640 4.616 11831.996 0.10% 57 0
16815 2.364 12039.993 7.669 12050.026 0.14% 80 0
16816 0.000 0.000 0.000 0.000 0.00% 0 0
PIDs 93
real 0m12.287s
user 0m1.474s
sys 0m1.686s
根据摘要,您可以使用 -p
/--pid
查看一个或多个进程的系统调用详细信息,或使用 -s
/--stats
标志查看排序列表。--stats
接受与摘要相同的排序和计数选项。
./strace-parser sidekiq_trace.txt p 16815
PID 16815
80 syscalls, active time: 2.364ms, user time: 7.669ms, total time: 12050.026ms
start time: 22:46:14.830267 end time: 22:46:26.880293
syscall count total (ms) max (ms) avg (ms) min (ms) errors
----------------- -------- ---------- ---------- ---------- ---------- --------
futex 5 10100.229 5400.106 2020.046 0.022 ETIMEDOUT: 2
restart_syscall 1 1939.764 1939.764 1939.764 1939.764 ETIMEDOUT: 1
getpid 33 1.020 0.046 0.031 0.018
clock_gettime 14 0.420 0.038 0.030 0.021
stat 6 0.277 0.072 0.046 0.031
read 6 0.170 0.036 0.028 0.020
openat 3 0.126 0.045 0.042 0.038
close 3 0.099 0.034 0.033 0.031
lseek 3 0.089 0.035 0.030 0.021
ioctl 3 0.082 0.033 0.027 0.023 ENOTTY: 3
fstat 3 0.081 0.034 0.027 0.022
---------------
Slowest file open times for PID 16815:
dur (ms) timestamp error filename
---------- --------------- --------------- ---------
0.045 22:46:16.771318 - /opt/gitlab/embedded/service/gitlab-rails/config/database.yml
0.043 22:46:26.877954 - /opt/gitlab/embedded/service/gitlab-rails/config/database.yml
0.038 22:46:22.174610 - /opt/gitlab/embedded/service/gitlab-rails/config/database.yml
在上面的示例中,我们可以看到 PID 16815
的哪个文件打开时间较长。
当结果中没有明显的内容时,获取更多上下文的一个好方法是,在执行客户执行的操作时,在您自己的极狐GitLab 实例上运行 strace
,然后比较两个结果的摘要,并深入研究差异。
open 系统调用的统计信息
在各种配置下对 open
和 openat
(用于访问文件)的调用的粗略数字。慢速存储可能导致 Gitaly 中可怕的 DeadlineExceeded
错误。
另请参阅手册中的此条目,以获取客户可以执行的快速测试来检查其文件系统性能。
请记住,来自 strace
的计时信息通常不太准确,因此不应将小差异视为重要。
Setup | access times |
---|---|
EFS | 10 - 30 ms |
Local Storage | 0.01 - 1 ms |
网络
端口
# Find the programs that are listening on ports
netstat -plnt
ss -plnt
lsof -i -P | grep <port>
互联网/DNS
# Show domain IP address
dig +short example.com
nslookup example.com
# Check DNS using specific nameserver
# 8.8.8.8 = google, 1.1.1.1 = cloudflare, 208.67.222.222 = opendns
dig @8.8.8.8 example.com
nslookup example.com 1.1.1.1
# Find host provider
whois <ip_address> | grep -i "orgname\|netname"
# Curl headers with redirect
curl --head --location "https://example.com"
# Test if a host is reachable on the network. `ping6` works on IPv6 networks.
ping example.com
# Show the route taken to a host. `traceroute6` works on IPv6 networks.
traceroute example.com
mtr example.com
# List details of network interfaces
ip address
# Check local DNS settings
cat /etc/hosts
cat /etc/resolv.conf
systemd-resolve --status
# Capture traffic to/from a host
sudo tcpdump host www.example.com
软件包管理
# Debian/Ubuntu
# List packages
dpkg -l
apt list --installed
# Find an installed package
dpkg -l | grep <package>
apt list --installed | grep <package>
# Install a package
dpkg -i <package_name>.deb
apt-get install <package>
apt install <package>
# CentOS/RedHat
# Install a package
yum install <package>
dnf install <package> # RHEL/CentOS 8+
rpm -ivh <package_name>.rpm
# Find an installed package
rpm -qa | grep <package>
日志
```shell # Print last lines in log file where ‘n’ # is the number of lines to print tail -n /path/to/log/file