searchusermenu
  • 发布文章
  • 消息中心
点赞
收藏
评论
分享
原创

IPI虚拟化(IPI Virtualization)之三(如何测试)

2023-06-02 08:45:15
324
0

IPI虚拟化的测试,包含两个方面:

  • gutest内的ipi benchmark程序(发送各类IPI)
  • host上监测 guest的退出情况

(一)         IPI benchmark

guet的ipi bench mark程序详见以下链接:

 https://lore.kernel.org/kvm/20171219085010.4081-1-ynorov@caviumnetworks.com/

From: Yury Norov <ynorov@caviumnetworks.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Cc: Yury Norov <ynorov@caviumnetworks.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ashish Kalra <Ashish.Kalra@cavium.com>,
	Christoffer Dall <christoffer.dall@linaro.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Linu Cherian <Linu.Cherian@cavium.com>,
	Shih-Wei Li <shihwei@cs.columbia.edu>,
	Sunil Goutham <Sunil.Goutham@cavium.com>
Subject: [PATCH v2] IPI performance benchmark
Date: Tue, 19 Dec 2017 11:50:10 +0300	[thread overview]
Message-ID: <20171219085010.4081-1-ynorov@caviumnetworks.com> (raw)

This benchmark sends many IPIs in different modes and measures
time for IPI delivery (first column), and total time, ie including
time to acknowledge the receive by sender (second column).

The scenarios are:
Dry-run:	do everything except actually sending IPI. Useful
		to estimate system overhead.
Self-IPI:	Send IPI to self CPU.
Normal IPI:	Send IPI to some other CPU.
Broadcast IPI:	Send broadcast IPI to all online CPUs.
Broadcast lock:	Send broadcast IPI to all online CPUs and force them
                acquire/release spinlock.

The raw output looks like this:
[  155.363374] Dry-run:                         0,            2999696 ns
[  155.429162] Self-IPI:                 30385328,           65589392 ns
[  156.060821] Normal IPI:              566914128,          631453008 ns
[  158.384427] Broadcast IPI:                   0,         2323368720 ns
[  160.831850] Broadcast lock:                  0,         2447000544 ns

For virtualized guests, sending and reveiving IPIs causes guest exit.
I used this test to measure performance impact on KVM subsystem of
Christoffer Dall's series "Optimize KVM/ARM for VHE systems" [1].

Test machine is ThunderX2, 112 online CPUs. Below the results normalized
to host dry-run time, broadcast lock results omitted. Smaller - better.

Host, v4.14:
Dry-run:	  0	    1
Self-IPI:         9	   18
Normal IPI:      81	  110
Broadcast IPI:    0	 2106

Guest, v4.14:
Dry-run:          0	    1
Self-IPI:        10	   18
Normal IPI:     305	  525
Broadcast IPI:    0    	 9729

Guest, v4.14 + [1]:
Dry-run:          0	    1
Self-IPI:         9	   18
Normal IPI:     176	  343
Broadcast IPI:    0	 9885

[1] https://www.spinics.net/lists/kvm/msg156755.html

v2:
  added broadcast lock test;
  added example raw output in patch description;

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ashish Kalra <Ashish.Kalra@cavium.com>
CC: Christoffer Dall <christoffer.dall@linaro.org>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Linu Cherian <Linu.Cherian@cavium.com>
CC: Shih-Wei Li <shihwei@cs.columbia.edu>
CC: Sunil Goutham <Sunil.Goutham@cavium.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 arch/Kconfig           |  10 ++++
 kernel/Makefile        |   1 +
 kernel/ipi_benchmark.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 164 insertions(+)
 create mode 100644 kernel/ipi_benchmark.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 400b9e1b2f27..1b216eb15642 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -82,6 +82,16 @@ config JUMP_LABEL
 	 ( On 32-bit x86, the necessary options added to the compiler
 	   flags may increase the size of the kernel slightly. )
 
+config IPI_BENCHMARK
+	tristate "Test IPI performance on SMP systems"
+	depends on SMP
+	help
+	  Test IPI performance on SMP systems. If system has only one online
+	  CPU, sending IPI to other CPU is obviously not possible, and ENOENT
+	  is returned for corresponding test.
+
+	  If unsure, say N.
+
 config STATIC_KEYS_SELFTEST
 	bool "Static key selftest"
 	depends on JUMP_LABEL
diff --git a/kernel/Makefile b/kernel/Makefile
index 172d151d429c..04e550e1990c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -101,6 +101,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
 obj-$(CONFIG_IRQ_WORK) += irq_work.o
 obj-$(CONFIG_CPU_PM) += cpu_pm.o
 obj-$(CONFIG_BPF) += bpf/
+obj-$(CONFIG_IPI_BENCHMARK) += ipi_benchmark.o
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/kernel/ipi_benchmark.c b/kernel/ipi_benchmark.c
new file mode 100644
index 000000000000..1dfa15e5ef70
--- /dev/null
+++ b/kernel/ipi_benchmark.c
@@ -0,0 +1,153 @@
+/*
+ * Performance test for IPI on SMP machines.
+ *
+ * Copyright (c) 2017 Cavium Networks.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/ktime.h>
+
+#define NTIMES 100000
+
+#define POKE_ANY	0
+#define DRY_RUN		1
+#define POKE_SELF	2
+#define POKE_ALL	3
+#define POKE_ALL_LOCK	4
+
+static void __init handle_ipi_spinlock(void *t)
+{
+	spinlock_t *lock = (spinlock_t *) t;
+
+	spin_lock(lock);
+	spin_unlock(lock);
+}
+
+static void __init handle_ipi(void *t)
+{
+	ktime_t *time = (ktime_t *) t;
+
+	if (time)
+		*time = ktime_get() - *time;
+}
+
+static ktime_t __init send_ipi(int flags)
+{
+	ktime_t time = 0;
+	DEFINE_SPINLOCK(lock);
+	unsigned int cpu = get_cpu();
+
+	switch (flags) {
+	case DRY_RUN:
+		/* Do everything except actually sending IPI. */
+		break;
+	case POKE_ALL:
+		/* If broadcasting, don't force all CPUs to update time. */
+		smp_call_function_many(cpu_online_mask, handle_ipi, NULL, 1);
+		break;
+	case POKE_ALL_LOCK:
+		smp_call_function_many(cpu_online_mask,
+				handle_ipi_spinlock, &lock, 1);
+		break;
+	case POKE_ANY:
+		cpu = cpumask_any_but(cpu_online_mask, cpu);
+		if (cpu >= nr_cpu_ids) {
+			time = -ENOENT;
+			break;
+		}
+		/* Fall thru */
+	case POKE_SELF:
+		time = ktime_get();
+		smp_call_function_single(cpu, handle_ipi, &time, 1);
+		break;
+	default:
+		time = -EINVAL;
+	}
+
+	put_cpu();
+	return time;
+}
+
+static int __init __bench_ipi(unsigned long i, ktime_t *time, int flags)
+{
+	ktime_t t;
+
+	*time = 0;
+	while (i--) {
+		t = send_ipi(flags);
+		if ((int) t < 0)
+			return (int) t;
+
+		*time += t;
+	}
+
+	return 0;
+}
+
+static int __init bench_ipi(unsigned long times, int flags,
+				ktime_t *ipi, ktime_t *total)
+{
+	int ret;
+
+	*total = ktime_get();
+	ret = __bench_ipi(times, ipi, flags);
+	if (unlikely(ret))
+		return ret;
+
+	*total = ktime_get() - *total;
+
+	return 0;
+}
+
+static int __init init_bench_ipi(void)
+{
+	ktime_t ipi, total;
+	int ret;
+
+	ret = bench_ipi(NTIMES, DRY_RUN, &ipi, &total);
+	if (ret)
+		pr_err("Dry-run FAILED: %d\n", ret);
+	else
+		pr_err("Dry-run:        %18llu, %18llu ns\n", ipi, total);
+
+	ret = bench_ipi(NTIMES, POKE_SELF, &ipi, &total);
+	if (ret)
+		pr_err("Self-IPI FAILED: %d\n", ret);
+	else
+		pr_err("Self-IPI:       %18llu, %18llu ns\n", ipi, total);
+
+	ret = bench_ipi(NTIMES, POKE_ANY, &ipi, &total);
+	if (ret)
+		pr_err("Normal IPI FAILED: %d\n", ret);
+	else
+		pr_err("Normal IPI:     %18llu, %18llu ns\n", ipi, total);
+
+	ret = bench_ipi(NTIMES, POKE_ALL, &ipi, &total);
+	if (ret)
+		pr_err("Broadcast IPI FAILED: %d\n", ret);
+	else
+		pr_err("Broadcast IPI:  %18llu, %18llu ns\n", ipi, total);
+
+	ret = bench_ipi(NTIMES, POKE_ALL_LOCK, &ipi, &total);
+	if (ret)
+		pr_err("Broadcast lock FAILED: %d\n", ret);
+	else
+		pr_err("Broadcast lock: %18llu, %18llu ns\n", ipi, total);
+
+	/* Return error to avoid annoying rmmod. */
+	return -EINVAL;
+}
+module_init(init_bench_ipi);
+
+MODULE_LICENSE("GPL");

(二)         测试步骤

 

测试步骤:

  • 虚拟机内执行ipi_benchmark

              taskset -c 3 insmod kernel/ipi_benchmark.ko

  • host上执行命令,统计vm exit的情况

              perf kvm stat record

  • ctrl+c结束步骤2中的统计,查看数据分析

           perf kvm stat report --vcpu 3

(三)         测试数据

host cmdline:

[root@inf-bqj708-compute10-10e4e8e17 secure]# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.10.60+ root=UUID=d0dcffb9-ec93-4981-86e8-4688b66345d9 ro crashkernel=512M iommu=pt intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=446 console=ttyS0,115200 isolcpus=1-16,81-96

guest xml:

<memory unit='KiB'>33554432</memory>
  <currentMemory unit='KiB'>33554432</currentMemory>
  <vcpu placement='static'>16</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='81'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='82'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='83'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='84'/>
    <vcpupin vcpu='8' cpuset='5'/>
    <vcpupin vcpu='9' cpuset='85'/>
    <vcpupin vcpu='10' cpuset='6'/>
    <vcpupin vcpu='11' cpuset='86'/>
    <vcpupin vcpu='12' cpuset='7'/>
    <vcpupin vcpu='13' cpuset='87'/>
    <vcpupin vcpu='14' cpuset='8'/>
<vcpupin vcpu='15' cpuset='88'/>
<emulatorpin cpuset='1-8,81-88'/>
  </cputune>

  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='8' threads='2'/>
  </cpu>

IPIv Enable:

[  138.404439] Normal IPI:             1893598289,        12222643710 ns
[root@inf-bqj708-compute10-10e4e8e17 kernel-5.10.0-126.0.0]# perf  kvm stat report --vcpu 3
Analyze events for all VMs, VCPU 3:
             VM-EXIT             Samples  Samples%     Time%    Min Time    Max Time         Avg time
  EXTERNAL_INTERRUPT      12771    97.47%     0.76%      0.00us    492.60us      1.93us ( +-   3.12% )
          IO_INSTRUCTION        226     1.72%     0.02%      0.00us     16.51us      2.47us ( +-   3.02% )
                            CPUID         32     0.24%     0.00%      0.00us      0.82us      0.38us ( +-   5.62% )
   PAUSE_INSTRUCTION         30     0.23%     0.00%      0.00us      1.55us      0.51us ( +-   9.25% )
                                HLT         29     0.22%    99.22%      0.00us 1999988.36us 111525.71us ( +-  67.64% )
                   APIC_WRITE          6     0.05%     0.00%      0.00us     23.43us     16.39us ( +-  16.52% )
            EPT_VIOLATION          4     0.03%     0.00%      0.00us      9.70us      5.47us ( +-  31.36% )
                   MSR_READ          3     0.02%     0.00%      0.00us      0.78us      0.67us ( +-  15.59% )
          EXCEPTION_NMI          2     0.02%     0.00%      0.00us      5.97us      5.29us ( +-  12.74% )
Total Samples:13103, Total events handled time:3259662.60us.

IPIv Disable:

[12784.705838] Normal IPI:             2039820089,        12542302277 ns
[root@inf-bqj708-compute10-10e4e8e17 kernel-5.10.0-126.0.0]# perf kvm stat report --vcpu 3
Analyze events for all VMs, VCPU 3:
             VM-EXIT           Samples  Samples%     Time%    Min Time    Max Time         Avg time
                  MSR_WRITE    1000408    98.79%     4.34%      0.00us     21.04us      0.50us ( +-   0.02% )
  EXTERNAL_INTERRUPT      11847     1.17%     0.15%      0.00us      9.11us      1.42us ( +-   0.20% )
          IO_INSTRUCTION        226     0.02%     0.00%      0.00us      6.98us      2.41us ( +-   1.73% )
                            CPUID         53     0.01%     0.00%      0.00us      1.27us      0.39us ( +-   5.53% )
                                HLT         49     0.00%    95.51%      0.00us 3999994.80us 225218.88us ( +-  44.63% )
    PAUSE_INSTRUCTION         19     0.00%     0.00%      0.00us      1.14us      0.56us ( +-   9.96% )
                     MSR_READ          6     0.00%     0.00%      0.00us      1.01us      0.85us ( +-   7.61% )
            EXCEPTION_NMI          2     0.00%     0.00%      0.00us      6.60us      5.84us ( +-  13.09% )
              EPT_VIOLATION          2     0.00%     0.00%      0.00us      7.32us      7.14us ( +-   2.58% )
0条评论
0 / 1000
王****波
6文章数
0粉丝数
王****波
6 文章 | 0 粉丝
原创

IPI虚拟化(IPI Virtualization)之三(如何测试)

2023-06-02 08:45:15
324
0

IPI虚拟化的测试,包含两个方面:

  • gutest内的ipi benchmark程序(发送各类IPI)
  • host上监测 guest的退出情况

(一)         IPI benchmark

guet的ipi bench mark程序详见以下链接:

 https://lore.kernel.org/kvm/20171219085010.4081-1-ynorov@caviumnetworks.com/

From: Yury Norov <ynorov@caviumnetworks.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Cc: Yury Norov <ynorov@caviumnetworks.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ashish Kalra <Ashish.Kalra@cavium.com>,
	Christoffer Dall <christoffer.dall@linaro.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Linu Cherian <Linu.Cherian@cavium.com>,
	Shih-Wei Li <shihwei@cs.columbia.edu>,
	Sunil Goutham <Sunil.Goutham@cavium.com>
Subject: [PATCH v2] IPI performance benchmark
Date: Tue, 19 Dec 2017 11:50:10 +0300	[thread overview]
Message-ID: <20171219085010.4081-1-ynorov@caviumnetworks.com> (raw)

This benchmark sends many IPIs in different modes and measures
time for IPI delivery (first column), and total time, ie including
time to acknowledge the receive by sender (second column).

The scenarios are:
Dry-run:	do everything except actually sending IPI. Useful
		to estimate system overhead.
Self-IPI:	Send IPI to self CPU.
Normal IPI:	Send IPI to some other CPU.
Broadcast IPI:	Send broadcast IPI to all online CPUs.
Broadcast lock:	Send broadcast IPI to all online CPUs and force them
                acquire/release spinlock.

The raw output looks like this:
[  155.363374] Dry-run:                         0,            2999696 ns
[  155.429162] Self-IPI:                 30385328,           65589392 ns
[  156.060821] Normal IPI:              566914128,          631453008 ns
[  158.384427] Broadcast IPI:                   0,         2323368720 ns
[  160.831850] Broadcast lock:                  0,         2447000544 ns

For virtualized guests, sending and reveiving IPIs causes guest exit.
I used this test to measure performance impact on KVM subsystem of
Christoffer Dall's series "Optimize KVM/ARM for VHE systems" [1].

Test machine is ThunderX2, 112 online CPUs. Below the results normalized
to host dry-run time, broadcast lock results omitted. Smaller - better.

Host, v4.14:
Dry-run:	  0	    1
Self-IPI:         9	   18
Normal IPI:      81	  110
Broadcast IPI:    0	 2106

Guest, v4.14:
Dry-run:          0	    1
Self-IPI:        10	   18
Normal IPI:     305	  525
Broadcast IPI:    0    	 9729

Guest, v4.14 + [1]:
Dry-run:          0	    1
Self-IPI:         9	   18
Normal IPI:     176	  343
Broadcast IPI:    0	 9885

[1] https://www.spinics.net/lists/kvm/msg156755.html

v2:
  added broadcast lock test;
  added example raw output in patch description;

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ashish Kalra <Ashish.Kalra@cavium.com>
CC: Christoffer Dall <christoffer.dall@linaro.org>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Linu Cherian <Linu.Cherian@cavium.com>
CC: Shih-Wei Li <shihwei@cs.columbia.edu>
CC: Sunil Goutham <Sunil.Goutham@cavium.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 arch/Kconfig           |  10 ++++
 kernel/Makefile        |   1 +
 kernel/ipi_benchmark.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 164 insertions(+)
 create mode 100644 kernel/ipi_benchmark.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 400b9e1b2f27..1b216eb15642 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -82,6 +82,16 @@ config JUMP_LABEL
 	 ( On 32-bit x86, the necessary options added to the compiler
 	   flags may increase the size of the kernel slightly. )
 
+config IPI_BENCHMARK
+	tristate "Test IPI performance on SMP systems"
+	depends on SMP
+	help
+	  Test IPI performance on SMP systems. If system has only one online
+	  CPU, sending IPI to other CPU is obviously not possible, and ENOENT
+	  is returned for corresponding test.
+
+	  If unsure, say N.
+
 config STATIC_KEYS_SELFTEST
 	bool "Static key selftest"
 	depends on JUMP_LABEL
diff --git a/kernel/Makefile b/kernel/Makefile
index 172d151d429c..04e550e1990c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -101,6 +101,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
 obj-$(CONFIG_IRQ_WORK) += irq_work.o
 obj-$(CONFIG_CPU_PM) += cpu_pm.o
 obj-$(CONFIG_BPF) += bpf/
+obj-$(CONFIG_IPI_BENCHMARK) += ipi_benchmark.o
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/kernel/ipi_benchmark.c b/kernel/ipi_benchmark.c
new file mode 100644
index 000000000000..1dfa15e5ef70
--- /dev/null
+++ b/kernel/ipi_benchmark.c
@@ -0,0 +1,153 @@
+/*
+ * Performance test for IPI on SMP machines.
+ *
+ * Copyright (c) 2017 Cavium Networks.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/ktime.h>
+
+#define NTIMES 100000
+
+#define POKE_ANY	0
+#define DRY_RUN		1
+#define POKE_SELF	2
+#define POKE_ALL	3
+#define POKE_ALL_LOCK	4
+
+static void __init handle_ipi_spinlock(void *t)
+{
+	spinlock_t *lock = (spinlock_t *) t;
+
+	spin_lock(lock);
+	spin_unlock(lock);
+}
+
+static void __init handle_ipi(void *t)
+{
+	ktime_t *time = (ktime_t *) t;
+
+	if (time)
+		*time = ktime_get() - *time;
+}
+
+static ktime_t __init send_ipi(int flags)
+{
+	ktime_t time = 0;
+	DEFINE_SPINLOCK(lock);
+	unsigned int cpu = get_cpu();
+
+	switch (flags) {
+	case DRY_RUN:
+		/* Do everything except actually sending IPI. */
+		break;
+	case POKE_ALL:
+		/* If broadcasting, don't force all CPUs to update time. */
+		smp_call_function_many(cpu_online_mask, handle_ipi, NULL, 1);
+		break;
+	case POKE_ALL_LOCK:
+		smp_call_function_many(cpu_online_mask,
+				handle_ipi_spinlock, &lock, 1);
+		break;
+	case POKE_ANY:
+		cpu = cpumask_any_but(cpu_online_mask, cpu);
+		if (cpu >= nr_cpu_ids) {
+			time = -ENOENT;
+			break;
+		}
+		/* Fall thru */
+	case POKE_SELF:
+		time = ktime_get();
+		smp_call_function_single(cpu, handle_ipi, &time, 1);
+		break;
+	default:
+		time = -EINVAL;
+	}
+
+	put_cpu();
+	return time;
+}
+
+static int __init __bench_ipi(unsigned long i, ktime_t *time, int flags)
+{
+	ktime_t t;
+
+	*time = 0;
+	while (i--) {
+		t = send_ipi(flags);
+		if ((int) t < 0)
+			return (int) t;
+
+		*time += t;
+	}
+
+	return 0;
+}
+
+static int __init bench_ipi(unsigned long times, int flags,
+				ktime_t *ipi, ktime_t *total)
+{
+	int ret;
+
+	*total = ktime_get();
+	ret = __bench_ipi(times, ipi, flags);
+	if (unlikely(ret))
+		return ret;
+
+	*total = ktime_get() - *total;
+
+	return 0;
+}
+
+static int __init init_bench_ipi(void)
+{
+	ktime_t ipi, total;
+	int ret;
+
+	ret = bench_ipi(NTIMES, DRY_RUN, &ipi, &total);
+	if (ret)
+		pr_err("Dry-run FAILED: %d\n", ret);
+	else
+		pr_err("Dry-run:        %18llu, %18llu ns\n", ipi, total);
+
+	ret = bench_ipi(NTIMES, POKE_SELF, &ipi, &total);
+	if (ret)
+		pr_err("Self-IPI FAILED: %d\n", ret);
+	else
+		pr_err("Self-IPI:       %18llu, %18llu ns\n", ipi, total);
+
+	ret = bench_ipi(NTIMES, POKE_ANY, &ipi, &total);
+	if (ret)
+		pr_err("Normal IPI FAILED: %d\n", ret);
+	else
+		pr_err("Normal IPI:     %18llu, %18llu ns\n", ipi, total);
+
+	ret = bench_ipi(NTIMES, POKE_ALL, &ipi, &total);
+	if (ret)
+		pr_err("Broadcast IPI FAILED: %d\n", ret);
+	else
+		pr_err("Broadcast IPI:  %18llu, %18llu ns\n", ipi, total);
+
+	ret = bench_ipi(NTIMES, POKE_ALL_LOCK, &ipi, &total);
+	if (ret)
+		pr_err("Broadcast lock FAILED: %d\n", ret);
+	else
+		pr_err("Broadcast lock: %18llu, %18llu ns\n", ipi, total);
+
+	/* Return error to avoid annoying rmmod. */
+	return -EINVAL;
+}
+module_init(init_bench_ipi);
+
+MODULE_LICENSE("GPL");

(二)         测试步骤

 

测试步骤:

  • 虚拟机内执行ipi_benchmark

              taskset -c 3 insmod kernel/ipi_benchmark.ko

  • host上执行命令,统计vm exit的情况

              perf kvm stat record

  • ctrl+c结束步骤2中的统计,查看数据分析

           perf kvm stat report --vcpu 3

(三)         测试数据

host cmdline:

[root@inf-bqj708-compute10-10e4e8e17 secure]# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.10.60+ root=UUID=d0dcffb9-ec93-4981-86e8-4688b66345d9 ro crashkernel=512M iommu=pt intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=446 console=ttyS0,115200 isolcpus=1-16,81-96

guest xml:

<memory unit='KiB'>33554432</memory>
  <currentMemory unit='KiB'>33554432</currentMemory>
  <vcpu placement='static'>16</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='81'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='82'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='83'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='84'/>
    <vcpupin vcpu='8' cpuset='5'/>
    <vcpupin vcpu='9' cpuset='85'/>
    <vcpupin vcpu='10' cpuset='6'/>
    <vcpupin vcpu='11' cpuset='86'/>
    <vcpupin vcpu='12' cpuset='7'/>
    <vcpupin vcpu='13' cpuset='87'/>
    <vcpupin vcpu='14' cpuset='8'/>
<vcpupin vcpu='15' cpuset='88'/>
<emulatorpin cpuset='1-8,81-88'/>
  </cputune>

  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='8' threads='2'/>
  </cpu>

IPIv Enable:

[  138.404439] Normal IPI:             1893598289,        12222643710 ns
[root@inf-bqj708-compute10-10e4e8e17 kernel-5.10.0-126.0.0]# perf  kvm stat report --vcpu 3
Analyze events for all VMs, VCPU 3:
             VM-EXIT             Samples  Samples%     Time%    Min Time    Max Time         Avg time
  EXTERNAL_INTERRUPT      12771    97.47%     0.76%      0.00us    492.60us      1.93us ( +-   3.12% )
          IO_INSTRUCTION        226     1.72%     0.02%      0.00us     16.51us      2.47us ( +-   3.02% )
                            CPUID         32     0.24%     0.00%      0.00us      0.82us      0.38us ( +-   5.62% )
   PAUSE_INSTRUCTION         30     0.23%     0.00%      0.00us      1.55us      0.51us ( +-   9.25% )
                                HLT         29     0.22%    99.22%      0.00us 1999988.36us 111525.71us ( +-  67.64% )
                   APIC_WRITE          6     0.05%     0.00%      0.00us     23.43us     16.39us ( +-  16.52% )
            EPT_VIOLATION          4     0.03%     0.00%      0.00us      9.70us      5.47us ( +-  31.36% )
                   MSR_READ          3     0.02%     0.00%      0.00us      0.78us      0.67us ( +-  15.59% )
          EXCEPTION_NMI          2     0.02%     0.00%      0.00us      5.97us      5.29us ( +-  12.74% )
Total Samples:13103, Total events handled time:3259662.60us.

IPIv Disable:

[12784.705838] Normal IPI:             2039820089,        12542302277 ns
[root@inf-bqj708-compute10-10e4e8e17 kernel-5.10.0-126.0.0]# perf kvm stat report --vcpu 3
Analyze events for all VMs, VCPU 3:
             VM-EXIT           Samples  Samples%     Time%    Min Time    Max Time         Avg time
                  MSR_WRITE    1000408    98.79%     4.34%      0.00us     21.04us      0.50us ( +-   0.02% )
  EXTERNAL_INTERRUPT      11847     1.17%     0.15%      0.00us      9.11us      1.42us ( +-   0.20% )
          IO_INSTRUCTION        226     0.02%     0.00%      0.00us      6.98us      2.41us ( +-   1.73% )
                            CPUID         53     0.01%     0.00%      0.00us      1.27us      0.39us ( +-   5.53% )
                                HLT         49     0.00%    95.51%      0.00us 3999994.80us 225218.88us ( +-  44.63% )
    PAUSE_INSTRUCTION         19     0.00%     0.00%      0.00us      1.14us      0.56us ( +-   9.96% )
                     MSR_READ          6     0.00%     0.00%      0.00us      1.01us      0.85us ( +-   7.61% )
            EXCEPTION_NMI          2     0.00%     0.00%      0.00us      6.60us      5.84us ( +-  13.09% )
              EPT_VIOLATION          2     0.00%     0.00%      0.00us      7.32us      7.14us ( +-   2.58% )
文章来自个人专栏
知识总结
6 文章 | 1 订阅
0条评论
0 / 1000
请输入你的评论
0
0