CPU和内存的物理拓扑结构
以AMD EPYC 7542 32-Core Processor (128 超线程)为例:
两个socket,每个socket上4个numa节点,每个numa节点上2个ccx,每个ccx上4个物理核,每个物理核上2个逻辑核。
名词:
CCX:共享L3的一组core,比如0-3,64-67,一共8个逻辑核
CCD:共享本地内存或者说处于同一个节点的一组core,比如0-7,64-71,一共16个逻辑核
如图所示
也可以用lstopo-no-graphics --no-io命令查看CCX内的core分布
#yum install -y hwloc
#lstopo-no-graphics --no-io
numa节点距离:
#numactl -H
绑定策略
原则:性能稳定的前提下,分配尽量灵活,减少内存资源碎片。
内存:
考虑到同socket内numa节点0-3互相之间的访问距离相差不大,但跨socket访问性能极度下降。可以使用interleave方式,在节点 0-3上自由分配内存,以减
少内存碎片。
vCPU:
考虑到同ccx内所有core共享L3,多线程协同工作情况下(比如nginx)性能更好,虽然逻辑核性能不如物理核心,但对同宿主机所有云主机来讲会更加公平
(性能平稳)。
绑定策略
8*vCPU和更小实例:
vCPU绑定到1个CCX,CCX内自由调度,比如
<cputune>
<vcpupin vcpu='0' cpuset='0-3,64-67'/>
<vcpupin vcpu='1' cpuset='0-3,64-67'/>
<vcpupin vcpu='2' cpuset='0-3,64-67'/>
<vcpupin vcpu='3' cpuset='0-3,64-67'/>
<vcpupin vcpu='4' cpuset='0-3,64-67'/>
<vcpupin vcpu='5' cpuset='0-3,64-67'/>
<vcpupin vcpu='6' cpuset='0-3,64-67'/>
<vcpupin vcpu='7' cpuset='0-3,64-67'/>
</cputune>
<numatune>
<memnode cellid='0' mode='strict' nodeset='0'/>
</numatune>
<cpumode='custom' match='exact' check='full'>
.....................................................................
<numa>
<cell id='0' cpus='0-7' memory='xxxxxx' unit='KiB'/>
</numa>
</cpu>
<topology sockets='1' cores='8' threads='1'>;内存绑定到节点 0-3,interleave模式;guest numa 节点一个
16*vCPU:
vCPU绑定到2个CCX,和物理cpu一比一绑定,比如
<vcpupin vcpu='0' cpuset='0'/>
<vcpupin vcpu='1' cpuset='64'/>
<vcpupin vcpu='2' cpuset='1'/>
<vcpupin vcpu='3' cpuset='65'/>
<vcpupin vcpu='4' cpuset='2'/>
<vcpupin vcpu='5' cpuset='66'/>
<vcpupin vcpu='6' cpuset='3'/>
<vcpupin vcpu='7' cpuset='67'/>
<vcpupin vcpu='8' cpuset='4'/>
<vcpupin vcpu='9' cpuset='68'/>
<vcpupin vcpu='10' cpuset='5'/>
<vcpupin vcpu='11' cpuset='69'/>
<vcpupin vcpu='12' cpuset='6'/>
<vcpupin vcpu='13' cpuset='70'/>
<vcpupin vcpu='14' cpuset='7'/>
<vcpupin vcpu='15' cpuset='71'/>
<topology sockets='1' cores='8' threads='2'>;内存绑定到节点 0-3,interleave模式;guest numa节点一个
32*vCPU:
vCPU分别绑定到4个CCX,和物理cpu一比一绑定,比如:
<vcpupin vcpu='0' cpuset='0'/>
<vcpupin vcpu='1' cpuset='64'/>
<vcpupin vcpu='2' cpuset='1'/>
<vcpupin vcpu='3' cpuset='65'/>
<vcpupin vcpu='4' cpuset='2'/>
<vcpupin vcpu='5' cpuset='66'/>
<vcpupin vcpu='6' cpuset='3'/>
<vcpupin vcpu='7' cpuset='67'/>
<vcpupin vcpu='8' cpuset='4'/>
<vcpupin vcpu='9' cpuset='68'/>
<vcpupin vcpu='10' cpuset='5'/>
<vcpupin vcpu='11' cpuset='69'/>
<vcpupin vcpu='12' cpuset='6'/>
<vcpupin vcpu='13' cpuset='70'/>
<vcpupin vcpu='14' cpuset='7'/>
<vcpupin vcpu='15' cpuset='71'/>
<topology sockets='1' cores='32' threads='1'>;内存绑定到节点 0-3,interleave模式;guest numa节点一个
64*vCPU:类比以上16和32
96*vCPU :拆分成两个48vCPU的逻辑进行绑定 ;内存绑定也拆分成2个mem节点 ;guest numa节点 2个