在通过DIY方式解决了专业显卡的散热问题后,nvidia t4等显卡可以在消费级显卡上正常使用,并由此开出vgpu工作实例满足vdi场景需求。本文记录了适配过程及虚机开通过程。
以下为本文所使用的软硬件环境,因线上环境大多为centos, 故本文使用相应版本的系统。日常使用中,可选择ubuntu等体验更好的操作系统完成适配。
```
[root@text ~]# lspci | grep -i controller | grep -i 3d
01:00.0 3D controller: NVIDIA Corporation Device 1eb8 (rev a1)
[root@text ~]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
[root@text ~]# lscpu | grep -i "model name"
Model name: 12th Gen Intel(R) Core(TM) i9-12900K
[root@text ~]# free -m
total used free shared buff/cache available
Mem: 15572 232 14540 9 799 14981
Swap: 7903 0 7903
[root@text ~]# uname -r
3.10.0-957.el7.x86_64
```
Disable nouveau:
```
[root@text ~]# cat /etc/default/grub | grep GRUB_CMDLINE_LINUX
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet modprobe.blacklist=nouveau"
[root@text ~]# uname -r
3.10.0-1160.92.1.el7.x86_64
[root@text ~]# history | grep grub2-mk
60 grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
```
Install vgpu driver:
```
# ./NVIDIA-Linux-x86_64-470.182.02-vgpu-kvm.run
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 470.182.02.............................................................................................................................................................................................................................................
# reboot
```
Detect the temperature:
```
[root@text ~]# nvidia-smi
Sun Jul 2 22:36:43 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.02 Driver Version: 470.182.02 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:01:00.0 Off | 0 |
| N/A 43C P0 30W / 70W | 82MiB / 15359MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
```
109 echo "638916c5-46ad-4a2f-9248-27b0164efc02">nvidia-232/create
110 nvidia-smi
111 lspci |grep -i 3d
112 cd ..
113 ls
114 cd 638916c5-46ad-4a2f-9248-27b0164efc02/
115 ls
116 yum install -y virt-manager
117 yum search qemu
118 yum install -y qemu-kvm
119 systemctl enable libvirtd
120 systemctl start libvirtd
121 history
yum groupinstall "Virtualization Host"
#### Disable firewalld and selixconfig
```
Define the xml, 升级qemu-kvm到2.12版本:
```
151 /usr/libexec/qemu-kvm --version
152 virsh define win10-rtx2080.xml
153 yum install -y centos-release-qemu-ev
154 sudo yum install -y qemu-kvm-ev
155 reboot
156 ls
157 virsh define win10-rtx2080.xml
```
Check the status:
```
[root@text 638916c5-46ad-4a2f-9248-27b0164efc02]# pwd
/sys/class/mdev_bus/0000:01:00.0/638916c5-46ad-4a2f-9248-27b0164efc02
[root@text 638916c5-46ad-4a2f-9248-27b0164efc02]# cat mdev_type/description
num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=4
```
virsh edit the vm :
```
<video>
<model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
</video>
+ <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
+ <source>
+ <address uuid='638916c5-46ad-4a2f-9248-27b0164efc02'/>
+ </source>
+ <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
+ </hostdev>
<redirdev bus='usb' type='spicevmc'>
<address type='usb' bus='0' port='2'/>
</redirdev>
```
Remove the selinux related. then start the machine.
nvidia-smi effect:
```
[root@text ~]# nvidia-smi
Mon Jul 3 03:27:52 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.02 Driver Version: 470.182.02 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:01:00.0 Off | 0 |
| N/A 34C P8 16W / 70W | 3859MiB / 15359MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 12613 C+G vgpu 3776MiB |
+-----------------------------------------------------------------------------+
```
此时可看到基于vgpu切分的虚拟机实例已经启动,且正常工作。