安装部署gpu pass through
1)在KVM主机上启用IOMMU
vi /etc/default/grub GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR= "$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU= true GRUB_TERMINAL_OUTPUT= "console" GRUB_CMDLINE_LINUX= "rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet amd_iommu=on" GRUB_DISABLE_RECOVERY= "true" |
如果是amd cpu在GRUB_CMDLINE_LINUX后面加上amd_iommu=on,如果是intel cpu则加上intel_iommu=on
2)禁用nouveau驱动
vi /etc/modprobe .d /blacklist-nouveau .conf blacklist nouveau options nouveau modeset=0 |
3)升级grub参数并重启生效
grub2-mkconfig -o /boot/grub2/grub .cfg reboot 检查iommu是否启动 dmesg | grep -E "DMAR|IOMMU" 检查nouveau是否禁用 dmesg | grep -i nouveau |
4)启动 vfio-pci 驱动,并绑定到设备
modprobe vfio-pci 这里需要将显卡所在的iommu_group所有设备都添加到 /etc/modprobe .d /vfio .conf 通过命令 for iommu_group in $( ls -dv /sys/kernel/iommu_groups/ */); do echo "IOMMU group $(basename " $iommu_group ")" for device in $( ls -1 "$iommu_group" /devices/ ); do echo -n $ '\t' lspci -nns "$device" done done 查找到对应设备,将Vendor ID和Device ID添加到 /etc/modprobe .d /vfio .conf ... IOMMU group 2 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship /Matisse PCIe Dummy Host Bridge [1022:1482] 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship /Matisse GPP Bridge [1022:1483] 07:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1650 SUPER] [10de:2187] (rev a1) 07:00.1 Audio device [0403]: NVIDIA Corporation TU116 High Definition Audio Controller [10de:1aeb] (rev a1) 07:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1) 07:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller [10de:1aed] (rev a1) ... vi /etc/modprobe .d /vfio .conf options vfio-pci ids=10de:2187,10de:1aeb,10de:1aec,10de:1aed,1022:1482,1022:1483 执行 dracut --force reboot dmesg | grep -i vfio 检查是否绑定 [root@dev /] # lspci -nnk -d 10de: 07:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1650 SUPER] [10de:2187] (rev a1) Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3852] Kernel driver in use: vfio-pci Kernel modules: nouveau 07:00.1 Audio device [0403]: NVIDIA Corporation TU116 High Definition Audio Controller [10de:1aeb] (rev a1) Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3852] Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel 07:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1) Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3852] Kernel driver in use: vfio-pci 07:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller [10de:1aed] (rev a1) Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3852] Kernel driver in use: vfio-pci Kernel modules: i2c_nvidia_gpu 会出现有设备无法绑定情况,需要手动设置,比如USB controller这个设备绑定不了执行下面命令 echo -n "0000:07:00.2" > /sys/bus/pci/drivers/xhci_hcd/unbind echo -n "0000:07:00.2" > /sys/bus/pci/drivers/vfio-pci/bind |