1. 环境准备
1.1. 环境配置
宿主机 | 操作系统 | 5.19.0-45-generic #46~22.04.1-Ubuntu / win10 双系统 |
---|---|---|
规格 | 18 核 36 线程 64G 内存,主频 2.3Ghz,睿频已关 | |
拓扑 |
|
|
客户机1 |
操作系统 |
5.19.0-45-generic #46~22.04.1-Ubuntu |
规格 | 8 核 8G 内存 | |
客户机2 | 操作系统 | win10 1909 |
规格 | 8 核 8G 内存 |
1.2. 测试&观测工具
宿主机 | sysbench / perf |
---|---|
客户机1 | sysbench |
客户机2 | cpuz |
1.3. 客户机配置文件
1.3.1. ubuntu
<domain type='kvm'> <name>ubuntu</name> <uuid>8280445b-28ae-48ff-8f1c-ceb79b3f8858</uuid> <metadata> <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0"> <libosinfo:os id="http://ubuntu.com/ubuntu/22.04"/> </libosinfo:libosinfo> </metadata> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='19'/> <vcpupin vcpu='2' cpuset='2'/> <vcpupin vcpu='3' cpuset='20'/> <vcpupin vcpu='4' cpuset='3'/> <vcpupin vcpu='5' cpuset='21'/> <vcpupin vcpu='6' cpuset='4'/> <vcpupin vcpu='7' cpuset='22'/> </cputune> <os> <type arch='x86_64' machine='pc-q35-6.2'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader> <nvram>/var/lib/libvirt/qemu/nvram/ubuntu_VARS.fd</nvram> </os> <features> <acpi/> </features> <cpu mode='host-passthrough' check='none' migratable='on'> <topology sockets='1' dies='1' cores='4' threads='2'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='yes'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices>...</devices> </domain> |
1.3.2. win10
<domain type='kvm'> <name>win10-2</name> <uuid>a5884a23-73f0-44d2-8ca9-45872223f844</uuid> <metadata> <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0"> <libosinfo:os id="http://microsoft.com/win/10"/> </libosinfo:libosinfo> </metadata> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='19'/> <vcpupin vcpu='2' cpuset='2'/> <vcpupin vcpu='3' cpuset='20'/> <vcpupin vcpu='4' cpuset='3'/> <vcpupin vcpu='5' cpuset='21'/> <vcpupin vcpu='6' cpuset='4'/> <vcpupin vcpu='7' cpuset='22'/> </cputune> <os> <type arch='x86_64' machine='pc-q35-6.2'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader> <nvram>/var/lib/libvirt/qemu/nvram/win10-2_VARS.fd</nvram> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-passthrough' check='none' migratable='on'> <topology sockets='1' dies='1' cores='4' threads='2'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices>...</devices> </domain> |
2. 性能分析
2.1. 问题描述
在宿主机通过 perf 抓取 windows 客户机的 kvm event 的时候,可以看到大量的 vmexit 事件,其中一部分怀疑是由于时钟中断导致的,需要解决。
2.2. 问题排查
2.2.1. ubuntu clocksource perf
由于 windows 相关的资料不多,首先尝试在 ubuntu 客户机中定位问题,基于 libvirt 的默认配置,可以看到 ubuntu 中支持的 clocksource 包含 kvm-clock tsc hpet acpi_pm四种类型,其默认值为 kvm-clock:
$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource kvm-clock tsc hpet acpi_pm $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource kvm-clock |
既然可用的 clocksource 包含4种类型,那为什么默认使用 kvm-clock?我们在客户机通过 sysbench 对单核进行压测,在宿主机通过 perf 采集下 vmexit 和 ioport 的 event 数据,对比一下差异。
2.2.1.1. hpet
客户机 events per secord 在280.83:
$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量 CPU speed: events per second: 280.83 General statistics: total time: 10.0037s total number of events: 2810 Latency (ms): min: 3.47 avg: 3.54 max: 6.91 95th percentile: 3.55 sum: 9944.81 Threads fairness: events (avg/stddev): 2810.0000/0.00 execution time (avg/stddev): 9.9448/0.00 |
宿主机有大量的 EPT_MISCONFIG 的 vmexit,ioport 事件数 0:
$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束 $ sudo perf kvm stat report --event=vmexit # vmexit Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EPT_MISCONFIG 22691 58.48% 0.22% 3.28us 1909.04us 8.15us ( +- 1.10% ) EXTERNAL_INTERRUPT 7027 18.11% 0.02% 1.50us 44.85us 2.78us ( +- 0.87% ) MSR_WRITE 4236 10.92% 0.01% 1.34us 10.93us 2.39us ( +- 0.72% ) PREEMPTION_TIMER 2533 6.53% 0.00% 0.75us 3.39us 0.92us ( +- 0.54% ) PAUSE_INSTRUCTION 1012 2.61% 0.00% 1.22us 11.43us 3.07us ( +- 1.20% ) HLT 953 2.46% 99.73% 2.68us 2015769.11us 87358.64us ( +- 7.90% ) EPT_VIOLATION 202 0.52% 0.00% 1.31us 37.57us 5.36us ( +- 6.12% ) CPUID 81 0.21% 0.00% 1.52us 5.60us 1.82us ( +- 4.20% ) MSR_READ 67 0.17% 0.00% 2.48us 12.72us 4.21us ( +- 3.93% ) EXCEPTION_NMI 1 0.00% 0.00% 8.61us 8.61us 8.61us ( +- 0.00% ) EOI_INDUCED 1 0.00% 0.00% 3.21us 3.21us 3.21us ( +- 0.00% ) Total Samples:38804, Total events handled time:83474382.31us. $ sudo perf kvm stat report --event=ioport #ioport Analyze events for all VMs, all VCPUs: IO Port Access Samples Samples% Time% Min Time Max Time Avg time Total Samples:0, Total events handled time:0.00us. |
2.2.1.2. acpi_pm
客户机 events per secord 在 282.44:
$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量 CPU speed: events per second: 282.44 General statistics: total time: 10.0021s total number of events: 2826 Latency (ms): min: 3.47 avg: 3.52 max: 6.83 95th percentile: 3.55 sum: 9953.45 Threads fairness: events (avg/stddev): 2826.0000/0.00 execution time (avg/stddev): 9.9535/0.00 |
宿主机有大量的 IO_INSTRUCTION 的 vmexit,从 ioport 中可以看到是 0x608 设备导致的 vmexit:
$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束 $ sudo perf kvm stat report --event=vmexit # vmexit Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time IO_INSTRUCTION 24618 65.05% 0.19% 3.91us 90.62us 5.96us ( +- 0.43% ) MSR_WRITE 4519 11.94% 0.01% 1.34us 8.61us 2.39us ( +- 0.68% ) EXTERNAL_INTERRUPT 3081 8.14% 0.02% 1.26us 32.63us 4.32us ( +- 0.90% ) PREEMPTION_TIMER 2534 6.70% 0.00% 0.74us 3.02us 0.95us ( +- 0.55% ) EPT_VIOLATION 1834 4.85% 0.01% 1.25us 39.92us 3.91us ( +- 1.98% ) HLT 1084 2.86% 99.77% 46.27us 1791869.86us 72652.91us ( +- 7.38% ) CPUID 81 0.21% 0.00% 1.52us 3.71us 1.76us ( +- 2.78% ) MSR_READ 71 0.19% 0.00% 1.94us 16.67us 3.93us ( +- 4.98% ) EPT_MISCONFIG 17 0.04% 0.00% 4.62us 9.64us 5.94us ( +- 4.26% ) PAUSE_INSTRUCTION 7 0.02% 0.00% 2.33us 5.13us 2.98us ( +- 13.56% ) EXCEPTION_NMI 1 0.00% 0.00% 5.96us 5.96us 5.96us ( +- 0.00% ) Total Samples:37847, Total events handled time:78936674.73us. $ sudo perf kvm stat report --event=ioport #ioport Analyze events for all VMs, all VCPUs: IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x608:PIN 24618 100.00% 100.00% 1.03us 6.61us 1.44us ( +- 0.28% ) Total Samples:24618, Total events handled time:35370.40us. |
通过 mtree 查看导致中断的具体设备是 acpi-tmr:
$ virsh qemu-monitor-command ubuntu --hmp info mtree | grep 608 0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr |
2.2.1.3. tsc
客户机 events per secord 在 286.59:
$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量 Threads started! CPU speed: events per second: 286.59 General statistics: total time: 10.0013s total number of events: 2867 Latency (ms): min: 3.46 avg: 3.49 max: 6.84 95th percentile: 3.49 sum: 9998.94 Threads fairness: events (avg/stddev): 2867.0000/0.00 execution time (avg/stddev): 9.9989/0.00 |
宿主机没有明显升高的 vmexit & ioport:
$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束 $ sudo perf kvm stat report --event=vmexit # vmexit Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time MSR_WRITE 4584 35.64% 0.02% 1.25us 23.83us 2.50us ( +- 0.90% ) EXTERNAL_INTERRUPT 3001 23.33% 0.02% 1.46us 35.68us 3.83us ( +- 1.03% ) PREEMPTION_TIMER 2522 19.61% 0.00% 0.68us 5.41us 0.85us ( +- 0.67% ) EPT_VIOLATION 1628 12.66% 0.01% 1.35us 38.79us 5.07us ( +- 2.09% ) HLT 964 7.49% 99.95% 2.10us 1948664.43us 75545.46us ( +- 7.88% ) CPUID 81 0.63% 0.00% 1.53us 3.88us 1.78us ( +- 3.06% ) MSR_READ 64 0.50% 0.00% 2.13us 7.19us 4.43us ( +- 2.86% ) PAUSE_INSTRUCTION 8 0.06% 0.00% 2.22us 7.03us 3.40us ( +- 17.21% ) EPT_MISCONFIG 8 0.06% 0.00% 5.11us 8.38us 7.32us ( +- 4.98% ) EXCEPTION_NMI 1 0.01% 0.00% 7.84us 7.84us 7.84us ( +- 0.00% ) VMCALL 1 0.01% 0.00% 9.00us 9.00us 9.00us ( +- 0.00% ) IO_INSTRUCTION 1 0.01% 0.00% 50.60us 50.60us 50.60us ( +- 0.00% ) Total Samples:12863, Total events handled time:72859743.23us. $ sudo perf kvm stat report --event=ioport #ioport Analyze events for all VMs, all VCPUs: IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x5040:POUT 1 100.00% 100.00% 48.51us 48.51us 48.51us ( +- 0.00% ) Total Samples:1, Total events handled time:48.51us. |
2.2.1.4. kvm-clock
客户机 events per secord 在287.60:
$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量 CPU speed: events per second: 287.60 General statistics: total time: 10.0006s total number of events: 2877 Latency (ms): min: 3.45 avg: 3.48 max: 3.65 95th percentile: 3.49 sum: 9998.43 Threads fairness: events (avg/stddev): 2877.0000/0.00 execution time (avg/stddev): 9.9984/0.00 |
宿主机没有明显升高的 vmexit & ioport:
$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束 $ sudo perf kvm stat report --event=vmexit # vmexit Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time MSR_WRITE 3941 37.82% 0.01% 1.17us 8.88us 2.23us ( +- 0.93% ) EXTERNAL_INTERRUPT 2641 25.34% 0.01% 1.48us 33.32us 3.80us ( +- 0.83% ) PREEMPTION_TIMER 2510 24.09% 0.00% 0.64us 3.67us 0.74us ( +- 0.47% ) HLT 779 7.48% 99.97% 4.74us 1960580.40us 96299.80us ( +- 7.69% ) EPT_VIOLATION 386 3.70% 0.00% 1.34us 31.19us 3.85us ( +- 5.50% ) CPUID 81 0.78% 0.00% 0.80us 3.56us 1.24us ( +- 4.87% ) MSR_READ 68 0.65% 0.00% 2.25us 6.72us 4.59us ( +- 2.33% ) EPT_MISCONFIG 8 0.08% 0.00% 4.24us 69.00us 19.27us ( +- 40.67% ) PAUSE_INSTRUCTION 5 0.05% 0.00% 1.28us 5.25us 2.71us ( +- 26.51% ) EXCEPTION_NMI 1 0.01% 0.00% 7.67us 7.67us 7.67us ( +- 0.00% ) EOI_INDUCED 1 0.01% 0.00% 3.32us 3.32us 3.32us ( +- 0.00% ) Total Samples:10421, Total events handled time:75040337.72us. $ sudo perf kvm stat report --event=ioport #ioport Analyze events for all VMs, all VCPUs: IO Port Access Samples Samples% Time% Min Time Max Time Avg time Total Samples:0, Total events handled time:0.00us. |
2.2.1.5. 结果对比
客户机中 sysbench 压测的 vcpu1,对应 pcpu19(参考前面libvirt的vcpupin配置),我们在宿主机上直接压测 pcpu19 单核看下性能数据:
$ taskset -c 19 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 CPU speed: events per second: 288.49 General statistics: total time: 10.0014s total number of events: 2886 Latency (ms): min: 3.45 avg: 3.47 max: 4.86 95th percentile: 3.49 sum: 10000.20 Threads fairness: events (avg/stddev): 2886.0000/0.00 execution time (avg/stddev): 10.0002/0.00 |
可以把我们关心的数据汇总一下:
clocksource | sysbench(events per second) | vmexit(Total Samples) | |
---|---|---|---|
宿主机 | tsc |
288.49 |
|
客户机 | hpet | 280.83 | 38804 |
acpi_pm | 282.44 | 37847 | |
tsc | 286.59 | 12863 | |
kvm-clock | 287.60 | 10421 |
这里 sysbench 的跑分仅做参考使用,其中 sysbench 的 cpu 的测试集在 non-root 模式下本身并不会导致 vmexit,所以上面表格里的 vmexit 并不是 sysbench 引起的,可以粗略的认为是时钟本身导致的,而 sysbench 跑分的差异,可以认为是不同的时钟实现导致的对客户机性能的影响。所以在 ubuntu 的客户机上,默认使用 kvm-clock 是合理的。
2.2.2. windows timer
先在宿主机捞取下 vmexit 的数据,抓取了大概 10s 左右的数据。
$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` $ sudo perf kvm stat report --event=vmexit Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time IO_INSTRUCTION 256738 72.97% 88.78% 1.49us 1235.12us 11.48us ( +- 0.36% ) EXTERNAL_INTERRUPT 37002 10.52% 6.14% 1.00us 58.65us 5.51us ( +- 0.43% ) EPT_VIOLATION 25381 7.21% 1.27% 0.90us 18.20us 1.66us ( +- 0.24% ) EOI_INDUCED 18709 5.32% 1.08% 0.98us 34.66us 1.91us ( +- 0.18% ) APIC_WRITE 9831 2.79% 0.89% 1.16us 34.20us 2.99us ( +- 0.43% ) HLT 1105 0.31% 1.13% 1.83us 1762.92us 34.09us ( +- 8.27% ) EPT_MISCONFIG 1060 0.30% 0.23% 2.05us 57.01us 7.21us ( +- 3.34% ) MSR_READ 944 0.27% 0.24% 1.96us 15.26us 8.29us ( +- 1.56% ) MSR_WRITE 576 0.16% 0.21% 2.04us 55.25us 12.20us ( +- 4.56% ) PAUSE_INSTRUCTION 328 0.09% 0.03% 1.54us 7.53us 3.13us ( +- 1.66% ) CPUID 144 0.04% 0.01% 1.67us 11.22us 1.98us ( +- 3.38% ) EXCEPTION_NMI 8 0.00% 0.00% 6.40us 7.90us 7.31us ( +- 2.15% ) Total Samples:351826, Total events handled time:3319455.97us. $sudo perf kvm stat report --event=ioport Analyze events for all VMs, all VCPUs: IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x608:PIN 158788 61.85% 77.51% 1.04us 38.51us 4.39us ( +- 0.21% ) 0x70:POUT 48975 19.08% 9.69% 0.89us 34.49us 1.78us ( +- 0.23% ) 0x71:PIN 47791 18.61% 10.73% 1.07us 28.67us 2.02us ( +- 0.21% ) 0x71:POUT 1184 0.46% 2.07% 12.81us 586.25us 15.71us ( +- 3.09% ) Total Samples:256738, Total events handled time:899615.43us. |
可以看到存在非常大量的 IO_INSTRUCTION,结合之前 ubuntu 客户机的经验数据,我们可以知道大概率跟 acpi timer 有关,通过 mtree 可以证明这一点。
$ virsh qemu-monitor-command win10-2 --hmp info mtree | grep 608 0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr |
2.3. 解决思路
根据上面的排查情况,我们确定 windows 客户机中大量的 vmexit 是跟 timer 有关,客户机的 timer 是与 hypervisor 这层息息相关,所以我们回到 libvirt ,在官方找到了针对windows的timer的优化。
<clock offset='localtime'> ... <timer name='hypervclock' present='yes'/> </clock> <features> ... <hyperv mode='custom'> <vpindex state='on'/> <synic state='on'/> <stimer state='on'/> </hyperv> </features> |
添加完之后,重新采集数据,可以看到 timer 导致的 IO_INSTRUCTION 已经没有了,events 的 handle time 也下降了几倍。
$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` $ sudo perf kvm stat report --event=vmexit Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EPT_VIOLATION 92840 61.14% 56.08% 1.12us 51.53us 3.98us ( +- 0.19% ) EXTERNAL_INTERRUPT 55604 36.62% 37.19% 1.35us 61.22us 4.41us ( +- 0.31% ) MSR_READ 960 0.63% 1.37% 2.49us 18.92us 9.44us ( +- 1.47% ) TPR_BELOW_THRESHOLD 699 0.46% 0.23% 1.63us 5.38us 2.13us ( +- 0.77% ) MSR_WRITE 640 0.42% 1.13% 2.90us 47.96us 11.67us ( +- 4.26% ) HLT 580 0.38% 3.18% 5.45us 511.55us 36.13us ( +- 3.29% ) EPT_MISCONFIG 268 0.18% 0.71% 2.97us 407.79us 17.55us ( +- 10.33% ) CPUID 160 0.11% 0.05% 1.73us 3.06us 1.89us ( +- 0.60% ) PAUSE_INSTRUCTION 75 0.05% 0.04% 1.88us 5.81us 3.65us ( +- 3.60% ) INTERRUPT_WINDOW 21 0.01% 0.01% 1.53us 3.05us 2.02us ( +- 4.81% ) EXCEPTION_NMI 7 0.00% 0.01% 5.37us 7.66us 6.60us ( +- 5.77% ) Total Samples:151854, Total events handled time:659416.83us. |
3. 结论
- 针对windows的客户机,需要开启hypervclock,以减少vmexit;