searchusermenu
  • 发布文章
  • 消息中心
点赞
收藏
评论
分享
原创

因Timer导致的Windows虚机大量IO_INSTRUCTION的问题

2023-10-20 04:16:10
158
0

1. 环境准备

1.1. 环境配置

宿主机 操作系统 5.19.0-45-generic #46~22.04.1-Ubuntu / win10 双系统
规格 18 核 36 线程 64G 内存,主频 2.3Ghz,睿频已关
拓扑

客户机1

操作系统

5.19.0-45-generic #46~22.04.1-Ubuntu
规格 8 核 8G 内存
客户机2 操作系统 win10 1909
规格 8 核 8G 内存

1.2. 测试&观测工具

宿主机 sysbench / perf
客户机1 sysbench
客户机2 cpuz

 

1.3. 客户机配置文件

1.3.1. ubuntu

<domain type='kvm'>
  <name>ubuntu</name>
  <uuid>8280445b-28ae-48ff-8f1c-ceb79b3f8858</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://ubuntu.com/ubuntu/22.04"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='19'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='20'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='21'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='22'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/ubuntu_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>...</devices>
</domain>

 

1.3.2. win10

<domain type='kvm'>
  <name>win10-2</name>
  <uuid>a5884a23-73f0-44d2-8ca9-45872223f844</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='19'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='20'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='21'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='22'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10-2_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>...</devices>
</domain>

 

2. 性能分析

2.1. 问题描述

在宿主机通过 perf 抓取 windows 客户机的 kvm event 的时候,可以看到大量的 vmexit 事件,其中一部分怀疑是由于时钟中断导致的,需要解决。

2.2. 问题排查

2.2.1. ubuntu clocksource perf

由于 windows 相关的资料不多,首先尝试在 ubuntu 客户机中定位问题,基于 libvirt 的默认配置,可以看到 ubuntu 中支持的 clocksource 包含 kvm-clock tsc hpet acpi_pm四种类型,其默认值为 kvm-clock:

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
kvm-clock tsc hpet acpi_pm


$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock

既然可用的 clocksource 包含4种类型,那为什么默认使用 kvm-clock?我们在客户机通过 sysbench 对单核进行压测,在宿主机通过 perf 采集下 vmexit 和 ioport 的 event 数据,对比一下差异。

2.2.1.1. hpet

客户机 events per secord 在280.83:

$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量


CPU speed:
    events per second:   280.83

General statistics:
    total time:                          10.0037s
    total number of events:              2810

Latency (ms):
         min:                                    3.47
         avg:                                    3.54
         max:                                    6.91
         95th percentile:                        3.55
         sum:                                 9944.81

Threads fairness:
    events (avg/stddev):           2810.0000/0.00
    execution time (avg/stddev):   9.9448/0.00

 

宿主机有大量的 EPT_MISCONFIG 的 vmexit,ioport 事件数 0:

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束
$ sudo perf kvm stat report --event=vmexit # vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

       EPT_MISCONFIG      22691    58.48%     0.22%      3.28us   1909.04us      8.15us ( +-   1.10% )
  EXTERNAL_INTERRUPT       7027    18.11%     0.02%      1.50us     44.85us      2.78us ( +-   0.87% )
           MSR_WRITE       4236    10.92%     0.01%      1.34us     10.93us      2.39us ( +-   0.72% )
    PREEMPTION_TIMER       2533     6.53%     0.00%      0.75us      3.39us      0.92us ( +-   0.54% )
   PAUSE_INSTRUCTION       1012     2.61%     0.00%      1.22us     11.43us      3.07us ( +-   1.20% )
                 HLT        953     2.46%    99.73%      2.68us 2015769.11us  87358.64us ( +-   7.90% )
       EPT_VIOLATION        202     0.52%     0.00%      1.31us     37.57us      5.36us ( +-   6.12% )
               CPUID         81     0.21%     0.00%      1.52us      5.60us      1.82us ( +-   4.20% )
            MSR_READ         67     0.17%     0.00%      2.48us     12.72us      4.21us ( +-   3.93% )
       EXCEPTION_NMI          1     0.00%     0.00%      8.61us      8.61us      8.61us ( +-   0.00% )
         EOI_INDUCED          1     0.00%     0.00%      3.21us      3.21us      3.21us ( +-   0.00% )

Total Samples:38804, Total events handled time:83474382.31us.

$ sudo perf kvm stat report --event=ioport #ioport
Analyze events for all VMs, all VCPUs:

      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time


Total Samples:0, Total events handled time:0.00us.

 

2.2.1.2. acpi_pm

客户机 events per secord 在 282.44:

$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量
CPU speed:
    events per second:   282.44

General statistics:
    total time:                          10.0021s
    total number of events:              2826

Latency (ms):
         min:                                    3.47
         avg:                                    3.52
         max:                                    6.83
         95th percentile:                        3.55
         sum:                                 9953.45

Threads fairness:
    events (avg/stddev):           2826.0000/0.00
    execution time (avg/stddev):   9.9535/0.00

 

宿主机有大量的 IO_INSTRUCTION 的 vmexit,从 ioport 中可以看到是 0x608 设备导致的 vmexit:

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束
$ sudo perf kvm stat report --event=vmexit # vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

      IO_INSTRUCTION      24618    65.05%     0.19%      3.91us     90.62us      5.96us ( +-   0.43% )
           MSR_WRITE       4519    11.94%     0.01%      1.34us      8.61us      2.39us ( +-   0.68% )
  EXTERNAL_INTERRUPT       3081     8.14%     0.02%      1.26us     32.63us      4.32us ( +-   0.90% )
    PREEMPTION_TIMER       2534     6.70%     0.00%      0.74us      3.02us      0.95us ( +-   0.55% )
       EPT_VIOLATION       1834     4.85%     0.01%      1.25us     39.92us      3.91us ( +-   1.98% )
                 HLT       1084     2.86%    99.77%     46.27us 1791869.86us  72652.91us ( +-   7.38% )
               CPUID         81     0.21%     0.00%      1.52us      3.71us      1.76us ( +-   2.78% )
            MSR_READ         71     0.19%     0.00%      1.94us     16.67us      3.93us ( +-   4.98% )
       EPT_MISCONFIG         17     0.04%     0.00%      4.62us      9.64us      5.94us ( +-   4.26% )
   PAUSE_INSTRUCTION          7     0.02%     0.00%      2.33us      5.13us      2.98us ( +-  13.56% )
       EXCEPTION_NMI          1     0.00%     0.00%      5.96us      5.96us      5.96us ( +-   0.00% )

Total Samples:37847, Total events handled time:78936674.73us.

$ sudo perf kvm stat report --event=ioport #ioport
Analyze events for all VMs, all VCPUs:

      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           0x608:PIN      24618   100.00%   100.00%      1.03us      6.61us      1.44us ( +-   0.28% )

Total Samples:24618, Total events handled time:35370.40us.

 

通过 mtree 查看导致中断的具体设备是 acpi-tmr:

$ virsh qemu-monitor-command ubuntu --hmp info mtree | grep 608


0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr

 

2.2.1.3. tsc

客户机 events per secord 在 286.59:

$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量
Threads started!

CPU speed:
    events per second:   286.59

General statistics:
    total time:                          10.0013s
    total number of events:              2867

Latency (ms):
         min:                                    3.46
         avg:                                    3.49
         max:                                    6.84
         95th percentile:                        3.49
         sum:                                 9998.94

Threads fairness:
    events (avg/stddev):           2867.0000/0.00
    execution time (avg/stddev):   9.9989/0.00

 

宿主机没有明显升高的 vmexit & ioport:

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束
$ sudo perf kvm stat report --event=vmexit # vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           MSR_WRITE       4584    35.64%     0.02%      1.25us     23.83us      2.50us ( +-   0.90% )
  EXTERNAL_INTERRUPT       3001    23.33%     0.02%      1.46us     35.68us      3.83us ( +-   1.03% )
    PREEMPTION_TIMER       2522    19.61%     0.00%      0.68us      5.41us      0.85us ( +-   0.67% )
       EPT_VIOLATION       1628    12.66%     0.01%      1.35us     38.79us      5.07us ( +-   2.09% )
                 HLT        964     7.49%    99.95%      2.10us 1948664.43us  75545.46us ( +-   7.88% )
               CPUID         81     0.63%     0.00%      1.53us      3.88us      1.78us ( +-   3.06% )
            MSR_READ         64     0.50%     0.00%      2.13us      7.19us      4.43us ( +-   2.86% )
   PAUSE_INSTRUCTION          8     0.06%     0.00%      2.22us      7.03us      3.40us ( +-  17.21% )
       EPT_MISCONFIG          8     0.06%     0.00%      5.11us      8.38us      7.32us ( +-   4.98% )
       EXCEPTION_NMI          1     0.01%     0.00%      7.84us      7.84us      7.84us ( +-   0.00% )
              VMCALL          1     0.01%     0.00%      9.00us      9.00us      9.00us ( +-   0.00% )
      IO_INSTRUCTION          1     0.01%     0.00%     50.60us     50.60us     50.60us ( +-   0.00% )

Total Samples:12863, Total events handled time:72859743.23us.

$ sudo perf kvm stat report --event=ioport #ioport
Analyze events for all VMs, all VCPUs:

      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time

         0x5040:POUT          1   100.00%   100.00%     48.51us     48.51us     48.51us ( +-   0.00% )

Total Samples:1, Total events handled time:48.51us.

 

2.2.1.4. kvm-clock

客户机 events per secord 在287.60:

$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量
CPU speed:
    events per second:   287.60

General statistics:
    total time:                          10.0006s
    total number of events:              2877

Latency (ms):
         min:                                    3.45
         avg:                                    3.48
         max:                                    3.65
         95th percentile:                        3.49
         sum:                                 9998.43

Threads fairness:
    events (avg/stddev):           2877.0000/0.00
    execution time (avg/stddev):   9.9984/0.00

 

宿主机没有明显升高的 vmexit & ioport:

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束
$ sudo perf kvm stat report --event=vmexit # vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           MSR_WRITE       3941    37.82%     0.01%      1.17us      8.88us      2.23us ( +-   0.93% )
  EXTERNAL_INTERRUPT       2641    25.34%     0.01%      1.48us     33.32us      3.80us ( +-   0.83% )
    PREEMPTION_TIMER       2510    24.09%     0.00%      0.64us      3.67us      0.74us ( +-   0.47% )
                 HLT        779     7.48%    99.97%      4.74us 1960580.40us  96299.80us ( +-   7.69% )
       EPT_VIOLATION        386     3.70%     0.00%      1.34us     31.19us      3.85us ( +-   5.50% )
               CPUID         81     0.78%     0.00%      0.80us      3.56us      1.24us ( +-   4.87% )
            MSR_READ         68     0.65%     0.00%      2.25us      6.72us      4.59us ( +-   2.33% )
       EPT_MISCONFIG          8     0.08%     0.00%      4.24us     69.00us     19.27us ( +-  40.67% )
   PAUSE_INSTRUCTION          5     0.05%     0.00%      1.28us      5.25us      2.71us ( +-  26.51% )
       EXCEPTION_NMI          1     0.01%     0.00%      7.67us      7.67us      7.67us ( +-   0.00% )
         EOI_INDUCED          1     0.01%     0.00%      3.32us      3.32us      3.32us ( +-   0.00% )

Total Samples:10421, Total events handled time:75040337.72us.

$ sudo perf kvm stat report --event=ioport #ioport
Analyze events for all VMs, all VCPUs:

      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time


Total Samples:0, Total events handled time:0.00us.

 

2.2.1.5. 结果对比

客户机中 sysbench 压测的 vcpu1,对应 pcpu19(参考前面libvirt的vcpupin配置),我们在宿主机上直接压测 pcpu19 单核看下性能数据:

$ taskset -c 19 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10
CPU speed:
    events per second:   288.49

General statistics:
    total time:                          10.0014s
    total number of events:              2886

Latency (ms):
         min:                                    3.45
         avg:                                    3.47
         max:                                    4.86
         95th percentile:                        3.49
         sum:                                10000.20

Threads fairness:
    events (avg/stddev):           2886.0000/0.00
    execution time (avg/stddev):   10.0002/0.00

 

可以把我们关心的数据汇总一下:

  clocksource sysbench(events per second) vmexit(Total Samples)
宿主机 tsc
288.49
 
客户机 hpet 280.83 38804
acpi_pm 282.44 37847
tsc 286.59 12863
kvm-clock 287.60 10421

 

这里 sysbench 的跑分仅做参考使用,其中 sysbench 的 cpu 的测试集在 non-root 模式下本身并不会导致 vmexit,所以上面表格里的 vmexit 并不是 sysbench 引起的,可以粗略的认为是时钟本身导致的,而 sysbench 跑分的差异,可以认为是不同的时钟实现导致的对客户机性能的影响。所以在 ubuntu 的客户机上,默认使用 kvm-clock 是合理的。

 

2.2.2. windows timer

先在宿主机捞取下 vmexit 的数据,抓取了大概 10s 左右的数据。

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64`
$ sudo perf kvm stat report --event=vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

      IO_INSTRUCTION     256738    72.97%    88.78%      1.49us   1235.12us     11.48us ( +-   0.36% )
  EXTERNAL_INTERRUPT      37002    10.52%     6.14%      1.00us     58.65us      5.51us ( +-   0.43% )
       EPT_VIOLATION      25381     7.21%     1.27%      0.90us     18.20us      1.66us ( +-   0.24% )
         EOI_INDUCED      18709     5.32%     1.08%      0.98us     34.66us      1.91us ( +-   0.18% )
          APIC_WRITE       9831     2.79%     0.89%      1.16us     34.20us      2.99us ( +-   0.43% )
                 HLT       1105     0.31%     1.13%      1.83us   1762.92us     34.09us ( +-   8.27% )
       EPT_MISCONFIG       1060     0.30%     0.23%      2.05us     57.01us      7.21us ( +-   3.34% )
            MSR_READ        944     0.27%     0.24%      1.96us     15.26us      8.29us ( +-   1.56% )
           MSR_WRITE        576     0.16%     0.21%      2.04us     55.25us     12.20us ( +-   4.56% )
   PAUSE_INSTRUCTION        328     0.09%     0.03%      1.54us      7.53us      3.13us ( +-   1.66% )
               CPUID        144     0.04%     0.01%      1.67us     11.22us      1.98us ( +-   3.38% )
       EXCEPTION_NMI          8     0.00%     0.00%      6.40us      7.90us      7.31us ( +-   2.15% )

Total Samples:351826, Total events handled time:3319455.97us.


$sudo perf kvm stat report --event=ioport
Analyze events for all VMs, all VCPUs:

      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           0x608:PIN     158788    61.85%    77.51%      1.04us     38.51us      4.39us ( +-   0.21% )
           0x70:POUT      48975    19.08%     9.69%      0.89us     34.49us      1.78us ( +-   0.23% )
            0x71:PIN      47791    18.61%    10.73%      1.07us     28.67us      2.02us ( +-   0.21% )
           0x71:POUT       1184     0.46%     2.07%     12.81us    586.25us     15.71us ( +-   3.09% )

Total Samples:256738, Total events handled time:899615.43us.

 

可以看到存在非常大量的 IO_INSTRUCTION,结合之前 ubuntu 客户机的经验数据,我们可以知道大概率跟 acpi timer 有关,通过 mtree 可以证明这一点。

$ virsh qemu-monitor-command win10-2 --hmp info mtree | grep 608
0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr

2.3. 解决思路

根据上面的排查情况,我们确定 windows 客户机中大量的 vmexit 是跟 timer 有关,客户机的 timer 是与 hypervisor 这层息息相关,所以我们回到 libvirt ,在官方找到了针对windows的timer的优化。

<clock offset='localtime'>
  ...
  <timer name='hypervclock' present='yes'/>
</clock>


<features>
  ...
  <hyperv mode='custom'>
    <vpindex state='on'/>
    <synic state='on'/>
    <stimer state='on'/>
  </hyperv>
</features>

 

添加完之后,重新采集数据,可以看到 timer 导致的 IO_INSTRUCTION 已经没有了,events 的 handle time 也下降了几倍。

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64`
$ sudo perf kvm stat report --event=vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

       EPT_VIOLATION      92840    61.14%    56.08%      1.12us     51.53us      3.98us ( +-   0.19% )
  EXTERNAL_INTERRUPT      55604    36.62%    37.19%      1.35us     61.22us      4.41us ( +-   0.31% )
            MSR_READ        960     0.63%     1.37%      2.49us     18.92us      9.44us ( +-   1.47% )
 TPR_BELOW_THRESHOLD        699     0.46%     0.23%      1.63us      5.38us      2.13us ( +-   0.77% )
           MSR_WRITE        640     0.42%     1.13%      2.90us     47.96us     11.67us ( +-   4.26% )
                 HLT        580     0.38%     3.18%      5.45us    511.55us     36.13us ( +-   3.29% )
       EPT_MISCONFIG        268     0.18%     0.71%      2.97us    407.79us     17.55us ( +-  10.33% )
               CPUID        160     0.11%     0.05%      1.73us      3.06us      1.89us ( +-   0.60% )
   PAUSE_INSTRUCTION         75     0.05%     0.04%      1.88us      5.81us      3.65us ( +-   3.60% )
    INTERRUPT_WINDOW         21     0.01%     0.01%      1.53us      3.05us      2.02us ( +-   4.81% )
       EXCEPTION_NMI          7     0.00%     0.01%      5.37us      7.66us      6.60us ( +-   5.77% )

Total Samples:151854, Total events handled time:659416.83us.

 

3. 结论

  • 针对windows的客户机,需要开启hypervclock,以减少vmexit;

 

0条评论
0 / 1000
LLL
10文章数
0粉丝数
LLL
10 文章 | 0 粉丝
原创

因Timer导致的Windows虚机大量IO_INSTRUCTION的问题

2023-10-20 04:16:10
158
0

1. 环境准备

1.1. 环境配置

宿主机 操作系统 5.19.0-45-generic #46~22.04.1-Ubuntu / win10 双系统
规格 18 核 36 线程 64G 内存,主频 2.3Ghz,睿频已关
拓扑

客户机1

操作系统

5.19.0-45-generic #46~22.04.1-Ubuntu
规格 8 核 8G 内存
客户机2 操作系统 win10 1909
规格 8 核 8G 内存

1.2. 测试&观测工具

宿主机 sysbench / perf
客户机1 sysbench
客户机2 cpuz

 

1.3. 客户机配置文件

1.3.1. ubuntu

<domain type='kvm'>
  <name>ubuntu</name>
  <uuid>8280445b-28ae-48ff-8f1c-ceb79b3f8858</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://ubuntu.com/ubuntu/22.04"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='19'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='20'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='21'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='22'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/ubuntu_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>...</devices>
</domain>

 

1.3.2. win10

<domain type='kvm'>
  <name>win10-2</name>
  <uuid>a5884a23-73f0-44d2-8ca9-45872223f844</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='19'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='20'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='21'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='22'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10-2_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>...</devices>
</domain>

 

2. 性能分析

2.1. 问题描述

在宿主机通过 perf 抓取 windows 客户机的 kvm event 的时候,可以看到大量的 vmexit 事件,其中一部分怀疑是由于时钟中断导致的,需要解决。

2.2. 问题排查

2.2.1. ubuntu clocksource perf

由于 windows 相关的资料不多,首先尝试在 ubuntu 客户机中定位问题,基于 libvirt 的默认配置,可以看到 ubuntu 中支持的 clocksource 包含 kvm-clock tsc hpet acpi_pm四种类型,其默认值为 kvm-clock:

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
kvm-clock tsc hpet acpi_pm


$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock

既然可用的 clocksource 包含4种类型,那为什么默认使用 kvm-clock?我们在客户机通过 sysbench 对单核进行压测,在宿主机通过 perf 采集下 vmexit 和 ioport 的 event 数据,对比一下差异。

2.2.1.1. hpet

客户机 events per secord 在280.83:

$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量


CPU speed:
    events per second:   280.83

General statistics:
    total time:                          10.0037s
    total number of events:              2810

Latency (ms):
         min:                                    3.47
         avg:                                    3.54
         max:                                    6.91
         95th percentile:                        3.55
         sum:                                 9944.81

Threads fairness:
    events (avg/stddev):           2810.0000/0.00
    execution time (avg/stddev):   9.9448/0.00

 

宿主机有大量的 EPT_MISCONFIG 的 vmexit,ioport 事件数 0:

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束
$ sudo perf kvm stat report --event=vmexit # vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

       EPT_MISCONFIG      22691    58.48%     0.22%      3.28us   1909.04us      8.15us ( +-   1.10% )
  EXTERNAL_INTERRUPT       7027    18.11%     0.02%      1.50us     44.85us      2.78us ( +-   0.87% )
           MSR_WRITE       4236    10.92%     0.01%      1.34us     10.93us      2.39us ( +-   0.72% )
    PREEMPTION_TIMER       2533     6.53%     0.00%      0.75us      3.39us      0.92us ( +-   0.54% )
   PAUSE_INSTRUCTION       1012     2.61%     0.00%      1.22us     11.43us      3.07us ( +-   1.20% )
                 HLT        953     2.46%    99.73%      2.68us 2015769.11us  87358.64us ( +-   7.90% )
       EPT_VIOLATION        202     0.52%     0.00%      1.31us     37.57us      5.36us ( +-   6.12% )
               CPUID         81     0.21%     0.00%      1.52us      5.60us      1.82us ( +-   4.20% )
            MSR_READ         67     0.17%     0.00%      2.48us     12.72us      4.21us ( +-   3.93% )
       EXCEPTION_NMI          1     0.00%     0.00%      8.61us      8.61us      8.61us ( +-   0.00% )
         EOI_INDUCED          1     0.00%     0.00%      3.21us      3.21us      3.21us ( +-   0.00% )

Total Samples:38804, Total events handled time:83474382.31us.

$ sudo perf kvm stat report --event=ioport #ioport
Analyze events for all VMs, all VCPUs:

      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time


Total Samples:0, Total events handled time:0.00us.

 

2.2.1.2. acpi_pm

客户机 events per secord 在 282.44:

$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量
CPU speed:
    events per second:   282.44

General statistics:
    total time:                          10.0021s
    total number of events:              2826

Latency (ms):
         min:                                    3.47
         avg:                                    3.52
         max:                                    6.83
         95th percentile:                        3.55
         sum:                                 9953.45

Threads fairness:
    events (avg/stddev):           2826.0000/0.00
    execution time (avg/stddev):   9.9535/0.00

 

宿主机有大量的 IO_INSTRUCTION 的 vmexit,从 ioport 中可以看到是 0x608 设备导致的 vmexit:

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束
$ sudo perf kvm stat report --event=vmexit # vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

      IO_INSTRUCTION      24618    65.05%     0.19%      3.91us     90.62us      5.96us ( +-   0.43% )
           MSR_WRITE       4519    11.94%     0.01%      1.34us      8.61us      2.39us ( +-   0.68% )
  EXTERNAL_INTERRUPT       3081     8.14%     0.02%      1.26us     32.63us      4.32us ( +-   0.90% )
    PREEMPTION_TIMER       2534     6.70%     0.00%      0.74us      3.02us      0.95us ( +-   0.55% )
       EPT_VIOLATION       1834     4.85%     0.01%      1.25us     39.92us      3.91us ( +-   1.98% )
                 HLT       1084     2.86%    99.77%     46.27us 1791869.86us  72652.91us ( +-   7.38% )
               CPUID         81     0.21%     0.00%      1.52us      3.71us      1.76us ( +-   2.78% )
            MSR_READ         71     0.19%     0.00%      1.94us     16.67us      3.93us ( +-   4.98% )
       EPT_MISCONFIG         17     0.04%     0.00%      4.62us      9.64us      5.94us ( +-   4.26% )
   PAUSE_INSTRUCTION          7     0.02%     0.00%      2.33us      5.13us      2.98us ( +-  13.56% )
       EXCEPTION_NMI          1     0.00%     0.00%      5.96us      5.96us      5.96us ( +-   0.00% )

Total Samples:37847, Total events handled time:78936674.73us.

$ sudo perf kvm stat report --event=ioport #ioport
Analyze events for all VMs, all VCPUs:

      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           0x608:PIN      24618   100.00%   100.00%      1.03us      6.61us      1.44us ( +-   0.28% )

Total Samples:24618, Total events handled time:35370.40us.

 

通过 mtree 查看导致中断的具体设备是 acpi-tmr:

$ virsh qemu-monitor-command ubuntu --hmp info mtree | grep 608


0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr

 

2.2.1.3. tsc

客户机 events per secord 在 286.59:

$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量
Threads started!

CPU speed:
    events per second:   286.59

General statistics:
    total time:                          10.0013s
    total number of events:              2867

Latency (ms):
         min:                                    3.46
         avg:                                    3.49
         max:                                    6.84
         95th percentile:                        3.49
         sum:                                 9998.94

Threads fairness:
    events (avg/stddev):           2867.0000/0.00
    execution time (avg/stddev):   9.9989/0.00

 

宿主机没有明显升高的 vmexit & ioport:

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束
$ sudo perf kvm stat report --event=vmexit # vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           MSR_WRITE       4584    35.64%     0.02%      1.25us     23.83us      2.50us ( +-   0.90% )
  EXTERNAL_INTERRUPT       3001    23.33%     0.02%      1.46us     35.68us      3.83us ( +-   1.03% )
    PREEMPTION_TIMER       2522    19.61%     0.00%      0.68us      5.41us      0.85us ( +-   0.67% )
       EPT_VIOLATION       1628    12.66%     0.01%      1.35us     38.79us      5.07us ( +-   2.09% )
                 HLT        964     7.49%    99.95%      2.10us 1948664.43us  75545.46us ( +-   7.88% )
               CPUID         81     0.63%     0.00%      1.53us      3.88us      1.78us ( +-   3.06% )
            MSR_READ         64     0.50%     0.00%      2.13us      7.19us      4.43us ( +-   2.86% )
   PAUSE_INSTRUCTION          8     0.06%     0.00%      2.22us      7.03us      3.40us ( +-  17.21% )
       EPT_MISCONFIG          8     0.06%     0.00%      5.11us      8.38us      7.32us ( +-   4.98% )
       EXCEPTION_NMI          1     0.01%     0.00%      7.84us      7.84us      7.84us ( +-   0.00% )
              VMCALL          1     0.01%     0.00%      9.00us      9.00us      9.00us ( +-   0.00% )
      IO_INSTRUCTION          1     0.01%     0.00%     50.60us     50.60us     50.60us ( +-   0.00% )

Total Samples:12863, Total events handled time:72859743.23us.

$ sudo perf kvm stat report --event=ioport #ioport
Analyze events for all VMs, all VCPUs:

      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time

         0x5040:POUT          1   100.00%   100.00%     48.51us     48.51us     48.51us ( +-   0.00% )

Total Samples:1, Total events handled time:48.51us.

 

2.2.1.4. kvm-clock

客户机 events per secord 在287.60:

$ taskset -c 1 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10 # sysbench压测单核性能,并将进程亲和性设定在cpu1减少变量
CPU speed:
    events per second:   287.60

General statistics:
    total time:                          10.0006s
    total number of events:              2877

Latency (ms):
         min:                                    3.45
         avg:                                    3.48
         max:                                    3.65
         95th percentile:                        3.49
         sum:                                 9998.43

Threads fairness:
    events (avg/stddev):           2877.0000/0.00
    execution time (avg/stddev):   9.9984/0.00

 

宿主机没有明显升高的 vmexit & ioport:

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64` # 10s后ctrl+c结束
$ sudo perf kvm stat report --event=vmexit # vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           MSR_WRITE       3941    37.82%     0.01%      1.17us      8.88us      2.23us ( +-   0.93% )
  EXTERNAL_INTERRUPT       2641    25.34%     0.01%      1.48us     33.32us      3.80us ( +-   0.83% )
    PREEMPTION_TIMER       2510    24.09%     0.00%      0.64us      3.67us      0.74us ( +-   0.47% )
                 HLT        779     7.48%    99.97%      4.74us 1960580.40us  96299.80us ( +-   7.69% )
       EPT_VIOLATION        386     3.70%     0.00%      1.34us     31.19us      3.85us ( +-   5.50% )
               CPUID         81     0.78%     0.00%      0.80us      3.56us      1.24us ( +-   4.87% )
            MSR_READ         68     0.65%     0.00%      2.25us      6.72us      4.59us ( +-   2.33% )
       EPT_MISCONFIG          8     0.08%     0.00%      4.24us     69.00us     19.27us ( +-  40.67% )
   PAUSE_INSTRUCTION          5     0.05%     0.00%      1.28us      5.25us      2.71us ( +-  26.51% )
       EXCEPTION_NMI          1     0.01%     0.00%      7.67us      7.67us      7.67us ( +-   0.00% )
         EOI_INDUCED          1     0.01%     0.00%      3.32us      3.32us      3.32us ( +-   0.00% )

Total Samples:10421, Total events handled time:75040337.72us.

$ sudo perf kvm stat report --event=ioport #ioport
Analyze events for all VMs, all VCPUs:

      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time


Total Samples:0, Total events handled time:0.00us.

 

2.2.1.5. 结果对比

客户机中 sysbench 压测的 vcpu1,对应 pcpu19(参考前面libvirt的vcpupin配置),我们在宿主机上直接压测 pcpu19 单核看下性能数据:

$ taskset -c 19 sysbench cpu run --cpu-max-prime=20000 --threads=1 --time=10
CPU speed:
    events per second:   288.49

General statistics:
    total time:                          10.0014s
    total number of events:              2886

Latency (ms):
         min:                                    3.45
         avg:                                    3.47
         max:                                    4.86
         95th percentile:                        3.49
         sum:                                10000.20

Threads fairness:
    events (avg/stddev):           2886.0000/0.00
    execution time (avg/stddev):   10.0002/0.00

 

可以把我们关心的数据汇总一下:

  clocksource sysbench(events per second) vmexit(Total Samples)
宿主机 tsc
288.49
 
客户机 hpet 280.83 38804
acpi_pm 282.44 37847
tsc 286.59 12863
kvm-clock 287.60 10421

 

这里 sysbench 的跑分仅做参考使用,其中 sysbench 的 cpu 的测试集在 non-root 模式下本身并不会导致 vmexit,所以上面表格里的 vmexit 并不是 sysbench 引起的,可以粗略的认为是时钟本身导致的,而 sysbench 跑分的差异,可以认为是不同的时钟实现导致的对客户机性能的影响。所以在 ubuntu 的客户机上,默认使用 kvm-clock 是合理的。

 

2.2.2. windows timer

先在宿主机捞取下 vmexit 的数据,抓取了大概 10s 左右的数据。

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64`
$ sudo perf kvm stat report --event=vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

      IO_INSTRUCTION     256738    72.97%    88.78%      1.49us   1235.12us     11.48us ( +-   0.36% )
  EXTERNAL_INTERRUPT      37002    10.52%     6.14%      1.00us     58.65us      5.51us ( +-   0.43% )
       EPT_VIOLATION      25381     7.21%     1.27%      0.90us     18.20us      1.66us ( +-   0.24% )
         EOI_INDUCED      18709     5.32%     1.08%      0.98us     34.66us      1.91us ( +-   0.18% )
          APIC_WRITE       9831     2.79%     0.89%      1.16us     34.20us      2.99us ( +-   0.43% )
                 HLT       1105     0.31%     1.13%      1.83us   1762.92us     34.09us ( +-   8.27% )
       EPT_MISCONFIG       1060     0.30%     0.23%      2.05us     57.01us      7.21us ( +-   3.34% )
            MSR_READ        944     0.27%     0.24%      1.96us     15.26us      8.29us ( +-   1.56% )
           MSR_WRITE        576     0.16%     0.21%      2.04us     55.25us     12.20us ( +-   4.56% )
   PAUSE_INSTRUCTION        328     0.09%     0.03%      1.54us      7.53us      3.13us ( +-   1.66% )
               CPUID        144     0.04%     0.01%      1.67us     11.22us      1.98us ( +-   3.38% )
       EXCEPTION_NMI          8     0.00%     0.00%      6.40us      7.90us      7.31us ( +-   2.15% )

Total Samples:351826, Total events handled time:3319455.97us.


$sudo perf kvm stat report --event=ioport
Analyze events for all VMs, all VCPUs:

      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time

           0x608:PIN     158788    61.85%    77.51%      1.04us     38.51us      4.39us ( +-   0.21% )
           0x70:POUT      48975    19.08%     9.69%      0.89us     34.49us      1.78us ( +-   0.23% )
            0x71:PIN      47791    18.61%    10.73%      1.07us     28.67us      2.02us ( +-   0.21% )
           0x71:POUT       1184     0.46%     2.07%     12.81us    586.25us     15.71us ( +-   3.09% )

Total Samples:256738, Total events handled time:899615.43us.

 

可以看到存在非常大量的 IO_INSTRUCTION,结合之前 ubuntu 客户机的经验数据,我们可以知道大概率跟 acpi timer 有关,通过 mtree 可以证明这一点。

$ virsh qemu-monitor-command win10-2 --hmp info mtree | grep 608
0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr

2.3. 解决思路

根据上面的排查情况,我们确定 windows 客户机中大量的 vmexit 是跟 timer 有关,客户机的 timer 是与 hypervisor 这层息息相关,所以我们回到 libvirt ,在官方找到了针对windows的timer的优化。

<clock offset='localtime'>
  ...
  <timer name='hypervclock' present='yes'/>
</clock>


<features>
  ...
  <hyperv mode='custom'>
    <vpindex state='on'/>
    <synic state='on'/>
    <stimer state='on'/>
  </hyperv>
</features>

 

添加完之后,重新采集数据,可以看到 timer 导致的 IO_INSTRUCTION 已经没有了,events 的 handle time 也下降了几倍。

$ sudo perf kvm stat record -p `pidof qemu-system-x86_64`
$ sudo perf kvm stat report --event=vmexit
Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

       EPT_VIOLATION      92840    61.14%    56.08%      1.12us     51.53us      3.98us ( +-   0.19% )
  EXTERNAL_INTERRUPT      55604    36.62%    37.19%      1.35us     61.22us      4.41us ( +-   0.31% )
            MSR_READ        960     0.63%     1.37%      2.49us     18.92us      9.44us ( +-   1.47% )
 TPR_BELOW_THRESHOLD        699     0.46%     0.23%      1.63us      5.38us      2.13us ( +-   0.77% )
           MSR_WRITE        640     0.42%     1.13%      2.90us     47.96us     11.67us ( +-   4.26% )
                 HLT        580     0.38%     3.18%      5.45us    511.55us     36.13us ( +-   3.29% )
       EPT_MISCONFIG        268     0.18%     0.71%      2.97us    407.79us     17.55us ( +-  10.33% )
               CPUID        160     0.11%     0.05%      1.73us      3.06us      1.89us ( +-   0.60% )
   PAUSE_INSTRUCTION         75     0.05%     0.04%      1.88us      5.81us      3.65us ( +-   3.60% )
    INTERRUPT_WINDOW         21     0.01%     0.01%      1.53us      3.05us      2.02us ( +-   4.81% )
       EXCEPTION_NMI          7     0.00%     0.01%      5.37us      7.66us      6.60us ( +-   5.77% )

Total Samples:151854, Total events handled time:659416.83us.

 

3. 结论

  • 针对windows的客户机,需要开启hypervclock,以减少vmexit;

 

文章来自个人专栏
李浩
10 文章 | 1 订阅
0条评论
0 / 1000
请输入你的评论
0
0