环境准备
1、确保BIOS中使能VT-d和SR-IOV
2、系统开启IOMMU
vim /etc/default/grub |
在GRUB_CMDLINE_LINUX字段中添加
intel_iommu=on iommu=pt |
生效配置
grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfgreboot |
重启后可看到iommu已开启
[root@localhost ~]# dmesg | grep IOMMU[ 0.125709] DMAR: IOMMU enabled |
固件开启SR-IOV,配置NUM_OF_VFS
1、 开启Mellanox Software Tools (MST)驱动
为后续配置固件做准备
[root@localhost ~]# mst startStarting MST (Mellanox Software Tools) driver setLoading MST PCI module - SuccessLoading MST PCI configuration module - SuccessCreate devices-W- Missing "lsusb" command, skipping MTUSB devices detectionUnloading MST PCI module (unused) - Success |
查询mst设备,选择所需要使用的设备,接下来以mt4125_pciconf1 为例进行配置
[root@localhost ~]# mst statusMST modules:------------MST PCI module is not loadedMST PCI configuration module loadedMST devices:------------/dev/mst/mt4125_pciconf0 - PCI configuration cycles access.domain:bus:dev.fn=0000:32:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1Chip revision is: 00/dev/mst/mt4125_pciconf1 - PCI configuration cycles access.domain:bus:dev.fn=0000:98:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1Chip revision is: 00 |
2、SR-IOV配置查询及修改
查询SR-IOV配置
mlxconfig -d /dev/mst/mt4125_pciconf1 q |
使能SR-IOV
mlxconfig -d /dev/mst/mt4125_pciconf1 set SRIOV_EN=1 |
修改固件vf数量NUM_OF_VFS,例如修改为5,注意这个参数是vf数量上限,而不是真实已分配的vf数量
mlxconfig -d /dev/mst/mt4125_pciconf1 set NUM_OF_VFS=5 |
设置完成后需要重启机器
配置VF
1、查看pci设备、网口映射关系等信息
[root@localhost ~]# lspci -D | grep Mellanox0000:32:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]0000:32:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]0000:98:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]0000:98:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] |
[root@localhost ~]# ibdev2netdev -v0000:32:00.0 mlx5_0 (MT4125 - MCX621102AN-ADAT) ConnectX-6 Dx EN adapter card, 25GbE, Dual-port SFP28, PCIe 4.0 x8, No Crypto fw 22.34.1002 port 1 (DOWN ) ==> ens3f0np0 (Down)0000:32:00.1 mlx5_1 (MT4125 - MCX621102AN-ADAT) ConnectX-6 Dx EN adapter card, 25GbE, Dual-port SFP28, PCIe 4.0 x8, No Crypto fw 22.34.1002 port 1 (DOWN ) ==> ens3f1np1 (Down)0000:98:00.0 mlx5_2 (MT4125 - MCX621102AN-ADAT) ConnectX-6 Dx EN adapter card, 25GbE, Dual-port SFP28, PCIe 4.0 x8, No Crypto fw 22.34.1002 port 1 (ACTIVE) ==> ens6f0np0 (Up)0000:98:00.1 mlx5_3 (MT4125 - MCX621102AN-ADAT) ConnectX-6 Dx EN adapter card, 25GbE, Dual-port SFP28, PCIe 4.0 x8, No Crypto fw 22.34.1002 port 1 (DOWN ) ==> ens6f1np1 (Down) |
2、查看固件vf数量
[root@localhost~]# cat /sys/class/net/ens6f0np0/device/sriov_totalvfs5 |
3、设置vf数量
根据获取到的目标网卡的pci/网卡/设备名称信息,设置vf数量,以下四种方式等效,可以切分出5个vf
方法一:
[root@localhost~]# echo 5 > /sys/bus/pci/devices/0000:98:00.0/sriov_numvfs[root@localhost~]# cat /sys/bus/pci/devices/0000\:98\:00.0/sriov_numvfs5 |
方法二:
[root@localhost]# echo 5 > /sys/class/net/ens6f0np0/device/sriov_numvfs[root@localhost~]# cat /sys/class/net/ens6f0np0/device/sriov_numvfs5 |
方法三:
[root@localhost~]# echo 5 > /sys/class/infiniband/mlx5_2/device/mlx5_num_vfs[root@localhost~]# cat /sys/class/infiniband/mlx5_2/device/mlx5_num_vfs5 |
方法四:
[root@localhost~]# echo 5 > /sys/class/net/ens6f0np0/device/mlx5_num_vfs[root@localhost~]# cat /sys/class/net/ens6f0np0/device/mlx5_num_vfs5 |
注意:重启后vf数量会失效
4、查看vf是否成功生成
[root@localhost~]# lspci -D | grep Mellanox0000:32:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]0000:32:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]0000:98:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]0000:98:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]0000:98:00.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function0000:98:00.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function0000:98:00.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function0000:98:00.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function0000:98:00.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function |
[root@localhost~]# ibdev2netdev -v0000:32:00.0 mlx5_0 (MT4125 - MCX621102AN-ADAT) ConnectX-6 Dx EN adapter card, 25GbE, Dual-port SFP28, PCIe 4.0 x8, No Crypto fw 22.34.1002 port 1 (DOWN ) ==> ens3f0np0 (Down)0000:32:00.1 mlx5_1 (MT4125 - MCX621102AN-ADAT) ConnectX-6 Dx EN adapter card, 25GbE, Dual-port SFP28, PCIe 4.0 x8, No Crypto fw 22.34.1002 port 1 (DOWN ) ==> ens3f1np1 (Down)0000:98:00.0 mlx5_2 (MT4125 - MCX621102AN-ADAT) ConnectX-6 Dx EN adapter card, 25GbE, Dual-port SFP28, PCIe 4.0 x8, No Crypto fw 22.34.1002 port 1 (ACTIVE) ==> ens6f0np0 (Up)0000:98:00.1 mlx5_3 (MT4125 - MCX621102AN-ADAT) ConnectX-6 Dx EN adapter card, 25GbE, Dual-port SFP28, PCIe 4.0 x8, No Crypto fw 22.34.1002 port 1 (DOWN ) ==> ens6f1np1 (Down)0000:98:00.2 mlx5_4 (MT4126 - NA) fw 22.34.1002 port 1 (ACTIVE) ==> ens6f0v0 (Up)0000:98:00.3 mlx5_5 (MT4126 - NA) fw 22.34.1002 port 1 (ACTIVE) ==> ens6f0v1 (Up)0000:98:00.4 mlx5_6 (MT4126 - NA) fw 22.34.1002 port 1 (ACTIVE) ==> ens6f0v2 (Up)0000:98:00.5 mlx5_7 (MT4126 - NA) fw 22.34.1002 port 1 (ACTIVE) ==> ens6f0v3 (Up)0000:98:00.6 mlx5_8 (MT4126 - NA) fw 22.34.1002 port 1 (ACTIVE) ==> ens6f0v4 (Up) |
[root@l# ip link show ens6f0np06: ens6f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 08:c0:eb:4b:90:e8 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 1 link/ether 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 2 link/ether 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 3 link/ether 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 4 link/ether 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off |
此时可以看到已生成了5个vf,mlx5_4 ~ mlx5_8,对应的vf序号为0~4,对应的pci function为 0000:98:00.2 ~ 0000:98:00.6。
对于每个设备的state、GUID等详细信息,还可以通过以下命令进行查看。
[root@localhost ~]# ibstatus mlx5_2Infiniband device 'mlx5_2' port 1 status: default gid: fe80:0000:0000:0000:0ac0:ebff:fe4b:90e8 base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 25 Gb/sec (1X EDR) link_layer: Ethernet |
[root@localhost ~]# ibstat -d mlx5_2CA 'mlx5_2' CA type: MT4125 Number of ports: 1 Firmware version: 22.34.1002 Hardware version: 0 Node GUID: 0x08c0eb03004b90e8 System image GUID: 0x08c0eb03004b90e8 Port 1: State: Active Physical state: LinkUp Rate: 25 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0ac0ebfffe4b90e8 Link layer: Ethernet |
5、配置vf信息
根据上一步中查询到的vf情况,对于每个vf缺少信息进行按需配置。
所有vf的配置方法一致,以下操作均以mlx5_4 为例。
5.1配置MAC地址
echo 0000:98:00.2 > /sys/bus/pci/drivers/mlx5_core/unbindip link set ens6f0np0 vf 0 mac 00:22:33:44:55:66echo 0000:98:00.2 > /sys/bus/pci/drivers/mlx5_core/bind |
其中:
a. 配置前后需对该vf进行解绑及绑定操作。
b. 注意在以ip link set进行配置时,参数"ens6f0np0"为pf的网卡名称,"vf 0 "指为该pf下序号为0的vf进行配置,"00:22:33:44:55:66 "为配置的MAC地址。
配置完成后可以在ip link show中查看MAC地址。
同时,在配置过MAC地址后,应当会自动配置GUID信息(全局唯一标识符),可以通过ibstat查询到,无需再单独配置,如果查询结果为空或需要修改默认GUID,可参见5.2。
[root@localhost ~]# ibstat -d mlx5_4 |grep GUID Node GUID: 0x002233fffe445577 System image GUID: 0x08c0eb03004b90e8 Port GUID: 0x022233fffe445577 |
5.2配置GUID
如重新配置Node GUID
echo 08:c0:eb:03:00:4b:90:e9 > /sys/class/infiniband/mlx5_2/device/sriov/0/nodeecho 0000:98:00.2 > /sys/bus/pci/drivers/mlx5_core/unbindecho 0000:98:00.2 > /sys/bus/pci/drivers/mlx5_core/bind |
其中:
a. 配置后需对该v进行解绑及绑定操作
b. 注意在配置Node GUID时,"08:c0:eb:03:00:4b:90:e9"为配置的Node GUID,一般可依据pf的Node GUID设置,本处配置为pf的Node GUID+1,配置路径中"mlx5_2"为pf的设备名,"0"指为该pf下序号为0的vf进行配置,"node"指配置的是Node GUID信息。
5.3添加namespace及配置ip
对于同一台机器上的多个vf,如果需要进行互通操作,需要满足:
a. 各个vf以namespace进行隔离
b. 各个vf配置同一网段的ip
否则,如果只需要与其他机器通信,应当为各个vf配置不同网段的ip地址,此时不需要进行隔离操作。
5.3.1创建namespace
添加一个名称为ns1的namespace。
ip netns add ns1 |
5.3.2 将网卡ens6f0v0 添加到命名空间ns1。
ip link set ens6f0v0 netns ns1 |
此时ens6f0v0 在原来的namespace里将被移除,使用lspci和ibverbs命令将不能再看到此网卡的相关信息
5.3.3配置ip
在命名空间ns1上启动进程,进程以exit指令退出。
ip netns exec ns1 bash |
此时将进入到namespace1,并且在bash命令中,可以在这里对网卡配置IP等信息,如,给vf设备ens6f0v0配置ip为 200.1.1.93
ifconfig ens6f0v0 200.1.1.93 netmask 255.255.255.0 |
此时,该vf可以正常进行收发包操作,其他vf的配置操作与之一致。