1. 环境安装
一般运行环境要求部署结点
gitlab.ctyun.cn/vnet/cicd/vnet-deploy/-/tree/release-v20221230/datapanel/dpos-debug
cd /usr/local
tar -zxvf dpos-debug0330.gz
安装后所在目录: /usr/local/dpos-debug
安装后环境适配:
1.1.设置控制器master IP
其他环境变量可直接使用默认不需要修改
[dpos-debug]# vi env.rc
export ANSIBLE_CONFIG=./ansible.cfg
export LOG_PATH=./logs
export debugfile=$LOG_PATH/debug.log
export FAILED_STOP=true
export hosts=hosts
export ctrl=xx.xx.xx.xx #设置控制器master IP,每个资源池不一样, xx.xx.xx 为master 节ip 唯一修改配置
1.2. 一键生成hosts 文件
读取控制器纳管信息自动生成hosts 文件.
[dpos-debug]# ./hosts_gen.sh
新增网元类型可直接修改 ./status.sh 脚本。目前支持网元类型如下:agw sgw mcgw, vgw, natgw,appgw bmgw
1.3 rpm-build 打包工具
支持自己修改脚本后,重新编译打包,给其他环境安装使用:
需要环境上安装 rpm-build 打包工具:
[dpos-debug]# ./tools/build_rpm.sh
2.dpos一键式诊断工具简介:
主要是shell 脚本通过ansible 收集远端主机信息,加入一些条件检查判断是否存在问题,
达到一键式定位的目的。如果脚本检查发现问题脚本会停住,需要人工介入处理,按回车继续执行。
脚本收集的输出信息同一放到 logs/ 目录下备查,同时在屏幕上输出。
常用文件:
env.rc :环境变量设置,默认就可以无需修改
lhosts : 远程主机列表,默认无需修改,新加主机需要添加到列表中,考虑后续做成自动生成。
shell.sh: dpc 命令行远程调用脚本,可以通过./shell.sh -h 查看用法, ./shell.sh 10.8.73.65 dpc sys-get-app-stats agw 0
dpos_check.sh :一键式自动定位脚本,脚本主要是调用该目录下的其他定位脚本,可以通过丰富其他单点的定位点脚本,不断的丰富扩充。
dpos_collect.sh : 一键式信息收集脚本,调用其他脚本自动收集信息,和一键式定位脚本区别,信息收集更多,遇到错误不停一直收集到结束。
ping.sh: ping 所有dpos 主机管理口,检查是否ping 通。
status.sh: 调用管控接口,检查所有dpos 主机是否正常在线。
bgp_status.sh: 检查agw bgp 邻居是否正常建立。
app_stats.sh: 查看app 统计命令
eip_stats.sh: 查看eip 统计命令
eip_table.sh: 查看eip 表项命令
deploy节点: x.x.x.x
sudo -i
cd /usr/local/dpos-debug
目录结构:
[dpos-debug]# tree
.
├── ansible.cfg
├── app_stats.sh
├── bgp_status.sh
├── dpos_check.sh
├── dpos_collect.sh
├── eip_stats.sh
├── eip_table.sh
├── env.rc
├── facts
│ └── readme.txt
├── gwtype
├── hosts
├── hosts_gen.sh
├── logs
│ ├── bgp_status.txt
│ ├── debug.log
│ ├── log.log
│ ├── ping.txt
│ ├── readme.txt
│ └── status.txt
├── ping.sh
├── port_stats.sh
├── shell.sh
├── status.sh
├── tools
│ ├── build_rpm.sh
│ └── SPECS
│ └── dpos-debug.spec
├── trace.sh
├── uploads
│ └── readme.txt
└── upload.sh
3 一键式信息收集:
./dpos_collect.sh
输出结果 debug.log + 屏幕
4 远程dpc 命令调用脚本 shell.sh
./shell.sh x.x.x.x dpc sys-get-app-stats agw 0
使用方法指定远端主机 后面跟dpc 命令行,和登录远端主机使用dpc 一样方便
4.1 shell帮助信息
[dpos-debug]# ./shell.sh -h
sh shell.sh gateway dpc-cmd
gateway:
ip address of gateway
agw
igw
sgw
vgw
hosts [-v] : get all gatway ip address
-h: --help get help info
dpc-cmd:
dpc command line
example:
sh shell.sh hosts
sh shell.sh igw dpc sys-get-app-stats igw 0
sh shell.sh xgw dpc sys-get-thread-usage
sh shell.sh x.x.x.x dpc sys-get-app-stats agw 0
4.2 显示所有dpos主机信息
[dpos-debug]# ./shell.sh hosts
Type ManagementIp AvailabilityZone
AGW x.x.x.x tc2
AGW x.x.x.xx tc
4.2 调用执行网关主机命令
sh shell.sh gateway dpc-cmd
gateway:
ip address of gateway
agw
igw
sgw
vgw
hosts [-v] : get all gatway ip address
-h: --help get help info
dpc-cmd:
dpc command line
example:
sh shell.sh hosts
sh shell.sh igw dpc sys-get-app-stats igw 0
sh shell.sh xgw dpc sys-get-thread-usage
sh shell.sh 10.8.73.65 dpc sys-get-app-stats agw 0
5. 一键trace
sh trace.sh [gw_type] dpc_trace_cmd [clear]
./trace.sh x.x.x.x dpc sys-trace-pkt 1 100 --overlay_dip 77.8.47.58 clear
主要功能是设置设置报文条件,做报文流程抓包统计。脚本读取统计计数。主要是校验trace 丢包, 还有发送和接收统计是否一致。
带clear 先清计数, 再设置抓包统计。 不带clear 累计统计。
5.1 trace帮助信息
[dpos-debug]# ./trace.sh -h
sh trace.sh [gw_type] dpc_trace_cmd [clear]
gw_type:
agw
igw
xgw
xgw or "" is all agw gateway.
dpc_trace_cmd:
dpc sys-trace-pkt 1 100 --overlay_sip 10.25.144.18 --overlay_dip 100.124.10.77
clear: clear before trace
example:
sh trace.sh igw dpc sys-trace-pkt 1 100 --overlay_sip 10.25.144.18 --overlay_dip 100.124.10.77 --overlay_proto icmp
sh trace.sh xgw dpc sys-trace-pkt 1 100 --overlay_sip 10.25.144.18 --overlay_dip 100.124.10.77 --overlay_proto icmp
sh trace.sh dpc sys-trace-pkt 1 100 --underlayer_sip 10.25.144.18 --underlay_dip 100.124.10.77 clear
sh trace.sh xgw dpc sys-trace-pkt 1 100 --overlay_sip 10.25.144.18 --overlay_dip 100.124.10.77 --underlay_dport 4789
dpc sys-trace-pkt [options]
Flags:
--overlay_dip string overlay destination eip
--overlay_dport uint32 overlay destination port
--overlay_proto string overlay packet protocal
--overlay_sip string overlay source eip
--overlay_sport uint32 overlay source port
--underlay_dip string underlay destination eip
--underlay_dport uint32 underlay destination port
--underlay_proto string underlay packet protocal
--underlay_sip string underlay source eip
--underlay_sport uint32 underlay source port
--vni uint32 vni(0 - 16777216)
5.2 trace命令行试例
[dpos-debug]# ./trace.sh 10.8.92.79 dpc sys-trace-pkt 1 100 --overlay_dip 77.8.47.58
dpc sys-trace-pkt 1 100 --overlay_dip 77.8.47.58
x.x.x.x
TRACE_TOTAL_RX_PKTS 3
TRACE_TOTAL_TX_PKTS 3
cnt is 0
Check Trace Drop OK. //没有丢包统计
cnt is 0
Check Trace Rx and Tx OK. //接收发送一致
[dpos-debug]# ./trace.sh sgw dpc sys-trace-pkt 1 100 --overlay_dip 77.8.47.58
dpc sys-trace-pkt 1 100 --overlay_dip 77.8.47.58
sgw
x.x.x.x
TRACE_TOTAL_RX_PKTS 16
TRACE_TOTAL_TX_PKTS 16
y.y.y.y
cnt is 0
Check Trace Drop OK. /
cnt is 0
Check Trace Rx and Tx OK.
6. EIP 统计
./eip_stats.sh 1.1.1.1
在设备上读取两次eip 统计做差。
6.1 eip_stats 用法及帮助
[dpos-debug]# ./eip_stats.sh -h
sh eip_stats.sh gateway eip sleeptime
gateway:
agw
sgw
igw
vgw
xgw or "" is all gateway.
eip: ip address
example:
sh eip_stats.sh agw 100.124.9.46
sh eip_stats.sh xgw 100.124.9.46
sh eip_stats.sh 100.124.9.46
sh eip_stats.sh 100.124.9.46
6.2 eip_stats 使用试例
[dpos-debug]# ./eip_stats.sh 77.8.47.58
./eip_stats.sh 77.8.47.58
agw x.x.x.x DownHitPkts 1014387 1014381 6
sgw x.x.x.x InPkts 147174 147170 4
sgw x.x.x.x InPkts 439 439 0
sgw x.x.x.x OutPkts 5 5 0
cnt is 2
gw ip stats pkts
agw x.x.x.x DownHitPkts 6
sgw x.x.x.x InPkts 4
7. EIP 一键trace
sh eip_trace.sh [gw_type] five-tuple [get]
./eip_trace.sh 77.8.47.58
主要功能是设置设置报文条件,做报文流程抓包统计。脚本读取统计计数。主要是校验trace 丢包, 还有发送和接收统计是否一致。
带get 不设置条件,再次读取统计,适合极小流量场景使用。
7.1 eip trace帮助信息
[dpos-debug]# ./eip_trace.sh -h
sh eip_trace.sh [gw_type] five-tuple [clear]
gw_type:
agw
sgw
igw
xgw or "" is all gateway.
five-tuple:
vm-eip dst-ip protocol vm-port dst-port [set 0 for to be ignored]
clear: clear before trace
example:
sh eip_trace.sh igw 221.229.4.29 101.227.54.8 6 31238 443
sh eip_trace.sh xgw 221.229.4.29 0 0 0 0
sh eip_trace.sh 221.229.4.29
sh eip_trace.sh xgw 221.229.4.29 101.227.54.8 6 31238 443
sh eip_trace.sh xgw get
7.2 eip_trace 命令行试例
[dpos-debug]# sh eip_trace.sh 100.124.21.77 100.124.21.26 1 0 3615
[dpos-debug]# sh eip_trace.sh 100.124.21.77 100.124.21.26 1 0 3615
[dpos-debug]# sh eip_trace.sh 100.124.8.195
Direction out2in agw->sgw->igw->vm
agw
x.x.x.x
TRACE_TOTAL_RX_PKTS 4
TRACE_TOTAL_TX_PKTS 4
y.y.y.y
TRACE_TOTAL_RX_PKTS 100
TRACE_TOTAL_TX_PKTS 100
sgw
x.x.x.x
TRACE_TOTAL_RX_PKTS 100
TRACE_TOTAL_TX_PKTS 100
y.y.y.y
igw
x.x.x.x
TRACE_TOTAL_RX_PKTS 100
TRACE_TOTAL_TX_PKTS 100
y.y.y.y
cnt is 0
Check Trace Drop OK.//没有丢包统计
cnt is 0
Check Trace Rx and Tx OK.//接收发送一致
Direction in2Out vm->igw->sgw
agw
x.x.x.x
TRACE_TOTAL_RX_PKTS 100
TRACE_TOTAL_TX_PKTS 100
y.y.y.y
sgw
x.x.x.x
TRACE_TOTAL_RX_PKTS 100
TRACE_TOTAL_TX_PKTS 100
y.y.y.y
igw
x.x.x.x
TRACE_TOTAL_RX_PKTS 100
TRACE_TOTAL_TX_PKTS 100
y.y.y.y
Check Trace Drop OK.//没有丢包统计
cnt is 0
Check Trace Rx and Tx OK.//接收发送一致
[dpos-debug]# sh eip_trace.sh agw 100.124.8.195
Direction out2in agw->sgw->igw->vm
agw
x.x.x.x
TRACE_TOTAL_RX_PKTS 89
TRACE_TOTAL_TX_PKTS 89
y.y.y.y
cnt is 0
Check Trace Drop OK.
cnt is 0
Check Trace Rx and Tx OK.
Direction in2Out vm->igw->sgw
agw
x.x.x.x
TRACE_TOTAL_RX_PKTS 100
TRACE_TOTAL_TX_PKTS 100
y.y.y.y
cnt is 0
Check Trace Drop OK.
cnt is 0
Check Trace Rx and Tx OK.
8. NAT 一键trace
sh nat_trace.sh [gw_type] five-tuple [get]
./nat_trace.sh 100.124.21.77 100.124.21.26
主要功能是设置设置报文条件,做报文流程抓包统计。脚本读取统计计数。主要是校验trace 丢包, 还有发送和接收统计是否一致。
带get 不设置条件,再次读取统计,适合极小流量场景使用。
8.1 nat trace帮助信息
[dpos-debug]# ./nat_trace.sh -h
sh nat_trace.sh [gw_type] five-tuple [clear]
gw_type:
agw
sgw
igw
"" is all gateway.
five-tuple:
vm-eip dst-ip protocol vm-port dst-port [set 0 for to be ignored]
clear: clear before trace
example:
sh nat_trace.sh igw 221.229.4.29 101.227.54.8 6 31238 443
sh nat_trace.sh 221.229.4.29 0 0 0 0
sh nat_trace.sh 221.229.4.29
sh nat_trace.sh 221.229.4.29 101.227.54.8 6 31238 443
sh nat_trace.sh get
8.2 nat_trace 命令行试例
[dpos-debug]# sh nat_trace.sh 100.124.21.77 100.124.21.26
Direction out2in agw->sgw->natgw->vm
agw
x.x.x.x
TRACE_TOTAL_RX_PKTS 5
TRACE_TOTAL_TX_PKTS 5
y.y.y.y
sgw
x.x.x.x
TRACE_TOTAL_RX_PKTS 8
TRACE_TOTAL_TX_PKTS 8
y.y.y.y
natgw
x.x.x.x
TRACE_TOTAL_RX_PKTS 4
TRACE_TOTAL_TX_PKTS 4
y.y.y.y
cnt is 0
Check Trace Drop OK.
cnt is 0
Check Trace Rx and Tx OK.
Direction in2Out vm->natgw->sgw
agw
x.x.x.x
TRACE_TOTAL_RX_PKTS 5
TRACE_TOTAL_TX_PKTS 5
y.y.y.y
sgw
x.x.x.x
TRACE_TOTAL_RX_PKTS 5
TRACE_TOTAL_TX_PKTS 5
y.y.y.y
natgw
x.x.x.x
TRACE_TOTAL_RX_PKTS 4
TRACE_TOTAL_TX_PKTS 4
y.y.y.y
cnt is 0
Check Trace Drop OK.
cnt is 0
Check Trace Rx and Tx OK.
Check Trace Drop OK.//没有丢包统计
cnt is 0
Check Trace Rx and Tx OK.//接收发送一致
[dpos-debug]# sh eip_trace.sh agw 100.124.8.195
Direction out2in agw->sgw->igw->vm
agw
x.x.x.x
TRACE_TOTAL_RX_PKTS 89
TRACE_TOTAL_TX_PKTS 89
y.y.y.y
cnt is 0
Check Trace Drop OK.
cnt is 0
Check Trace Rx and Tx OK.
Direction in2Out vm->igw->sgw
agw
x.x.x.x
TRACE_TOTAL_RX_PKTS 100
TRACE_TOTAL_TX_PKTS 100
y.y.y.y
cnt is 0
Check Trace Drop OK.
cnt is 0
Check Trace Rx and Tx OK.
9. APPGW 一键trace
sh app_trace.sh [gw_type] five-tuple [get]
./app_trace.sh 100.124.21.77 100.124.21.26
主要功能是设置设置报文条件,做报文流程抓包统计。脚本读取统计计数。主要是校验trace 丢包, 还有发送和接收统计是否一致。
不带端口号表示不关心端口号,直接匹配源目的IP, 带端口号明确使用五元组匹配。
9.1 app trace帮助信息
[ dpos-debug]# ./app_trace.sh -h
gaway -h
sh app_trace.sh [gw_type] five-tuple [get]
gw_type:
appgw
"" is app gateway.
five-tuple:
src-ip dst-ip protocol src-port dst-port [set 0 for to be ignored]
get: get trace counter
example:
sh app_trace.sh appgw 221.229.4.29 101.227.54.8 6 31238 443
sh app_trace.sh 172.16.0.3 198.19.128.141 0 0 0
sh app_trace.sh 221.229.4.29
sh app_trace.sh 221.229.4.29 101.227.54.8 6 31238 443
sh app_trace.sh get
9.2 app_trace 命令行试例
[dpos-debug]# ./app_trace.sh 10.1.0.3 198.19.128.128
Direction down 10.1.0.3:0---[tcp]--->198.19.128.128:0
ansible appgw -i /usr/local/dpos-debug/hosts -m shell -a "dpc sys-clear-trace-stats; dpc sys-trace-pkt 1 100 --overlay_dip 198.19.128.128 --overlay_sip 10.1.0.3 --overlay_proto tcp"
appgw
x.x.x.x
TRACE_TOTAL_RX_PKTS 100
TRACE_TOTAL_TX_PKTS 100
y.y.y.y
cnt is 0
Check Trace Drop OK.
cnt is 0
Check Trace Rx and Tx OK.
Direction up 192.168.0.3:0---[tcp]->192.168.0.4:0
ansible appgw -i /usr/local/dpos-debug/hosts -m shell -a "dpc sys-clear-trace-stats; dpc sys-trace-pkt 1 100 --overlay_dip 192.168.0.4 --overlay_sip 192.168.0.3 --overlay_proto tcp"
appgw
x.x.x.x
TRACE_TOTAL_RX_PKTS 39
TRACE_TOTAL_TX_PKTS 39
y.y.y.y
cnt is 0
Check Trace Drop OK.
cnt is 0
Check Trace Rx and Tx OK.
down port cnt is 0
Check Port status OK.
10. session 会话查询
./session.sh 10.0.0.4 100.124.21.26
在设备上读取会话信息。
10.1 session 帮助信息
[dpos-debug]# sh session.sh -h
sh session.sh [gw_type] five-tuple [clear]
gw_type:
natgw
appgw
"" is natgw gateway.
five-tuple:
sip dip protocol sport dport vni [set 0 for to be ignored]
example:
sh session.sh natgw 100.124.21.26 100.124.21.77 6 31238 443 100
sh session.sh natgw 221.229.4.29 0 0 0 0
sh session.sh 0 100.124.21.77
sh session.sh 10.0.4 100.124.21.26
sh session.sh appgw 100.124.21.26 100.124.21.77 6 31238 443 100
sh session.sh xgw 100.124.21.26 100.124.21.77 6 31238 443 100
10.2 session 命令行试例
[dpos-debug]# sh session.sh 10.0.0.4 100.124.21.26
ansible natgw -i /usr/local/dpos-debug/hosts -m shell -a "dpc sys-get-tuple-more-session 2 10.0.0.4 100.124.21.26 0 0 0 0 0 0 0 0 -f vertical "
x.x.x.x | CHANGED | rc=0 >>
1 Rows:
*************************** 1.row ***************************
CreateTime: 1629768714
CreateType: 1
State: 1
ThreadId: 0
Up: {"sip":"10.0.0.4","dip":"100.124.21.26","sport":15619,"dport":0,"protol":1,"vni":385923,"remote_ip":"0.0.0.0","tunnel_ip":"0.0.0.0","total_pkts":0,"total_bytes":0}
Down: {"sip":"100.124.21.26","dip":"100.124.21.77","sport":0,"dport":4181,"protol":1,"vni":16777215,"remote_ip":"0.0.0.0","tunnel_ip":"0.0.0.0","total_pkts":0,"total_bytes":0}
RemainTime: 0
x.x.x.x| CHANGED | rc=0 >>
1 Rows:
*************************** 1.row ***************************
CreateTime: 1637265488
CreateType: 0
State: 1
ThreadId: 0
Up: {"sip":"10.0.0.4","dip":"100.124.21.26","sport":15619,"dport":0,"protol":1,"vni":385923,"remote_ip":"0.0.0.0","tunnel_ip":"0.0.0.0","total_pkts":0,"total_bytes":0}
Down: {"sip":"100.124.21.26","dip":"100.124.21.77","sport":0,"dport":4181,"protol":1,"vni":16777215,"remote_ip":"0.0.0.0","tunnel_ip":"0.0.0.0","total_pkts":0,"total_bytes":0}
RemainTime: 0
11. VGW 一键trace
sh vgw_trace.sh [gw_type] five-tuple [get]
./vgw_trace.sh 100.124.21.77 100.124.21.26
主要功能是设置设置报文条件,做报文流程抓包统计。脚本读取统计计数。主要是校验trace 丢包, 还有发送和接收统计是否一致。
不带端口号表示不关心端口号
11.1 vgw trace帮助信息
[dpos-debug]# ./vgw_trace.sh -h
sh vgw_trace.sh [gw_type] five-tuple [clear]
gw_type:
./vgw_trace.sh: line 148: u: command not found
vgw
"" is vgw.
five-tuple:
src-ip dst-ip protocol src-port dst-port [set 0 for to be ignored]
clear: clear before trace
example:
sh vgw_trace.sh vgw 221.229.4.29 101.227.54.8 6 31238 443
sh vgw_trace.sh 221.229.4.29 0 0 0 0
sh vgw_trace.sh 221.229.4.29
sh vgw_trace.sh get
10.2 vgw_trace 命令行试例
[dpos-debug]# sh vgw_trace.sh 221.229.4.29
Direction vgw 221.229.4.29:0->0:0
vgw
x.x.x.x
TRACE_TOTAL_RX_PKTS 13
TRACE_TOTAL_TX_PKTS 13
y.y.y.y
cnt is 0
Check Trace Drop OK.
cnt is 0
Check Trace Rx and Tx OK.
Direction vgw 0:0->221.229.4.29:0
vgw
x.x.x.x
TRACE_TOTAL_RX_PKTS 13
TRACE_TOTAL_TX_PKTS 11
TRACE_TOTAL_DROP_PKTS 2
y.y.y.y
cnt is 1
Check Trace Drop Failed.
vgw
x.x.x.x
DROP_ICMP_UNKNOWN_OP 2
y.y.y.y
cnt is 0
Check Trace Rx and Tx OK.
down port cnt is 0
Check Port status OK.