英伟达受限GPU与常规型号参数对比-天翼云开发者社区

英伟达受限GPU与常规型号参数对比

2022年10月7日，美国出台了对华半导体出口限制新规，其中就包括了对于高性能计算芯片对中国大陆的出口限制。并且以NVIDIA的A100芯片的性能指标作为限制标准。即同时满足以下两个条件的即为受管制的高性能计算芯片：（1）芯片的I/O带宽传输速率大于或等于600 Gbyte/s；（2）“数字处理单元原始计算单元”每次操作的比特长度乘以TOPS 计算出的的算力之和大于或等于4800TOPS。这也使得NVIDIA A100/H100系列、AMD MI200/300系列AI芯片无法对华出口。

随后NVIDIA为了在遵守美国限制规则的前提下，同时满足中国客户的需求，在2022年11月8日宣布将推出符合美国新规的A100的替代产品A800。从官方公布的参数来看，A800主要是将NVLink的传输速率由A100的600GB/s降至了400GB/s，其他参数与A100基本一致。

参数/型号	A100 40GB PCIe	A100 80GB PCIe	A100 40GB SXM	A100 80GB SXM	A800 40GB PCIe	A800 80GB PCIe	A800 80GB SXM
FP64	9.7 TFLOPS
FP64 Tensor Core	19.5 TFLOPS
FP32	19.5 TFLOPS
TF32 Tensor Core	156 TFLOPS		312 TFLOPS		156 TFLOPS		312 TFLOPS
BFLOAT16 Tensor Core	312 TFLOPS		624 TFLOPS		312 TFLOPS		624 TFLOPS
FP16 Tensor Core	312 TFLOPS		624 TFLOPS		312 TFLOPS		624 TFLOPS
INT8 Tensor Core	624 TOPS	624 TOPS	1248 TOPS	1248 TOPS	624 TOPS	624 TOPS	1248 TOPS
GPU memory	40GB HBM2e	80GB HBM2e	40GB HBM2e	80GB HBM2e	40GB HBM2e	80GB HBM2e	80GB HBM2e
GPU memory带宽	1555GB/s	1935GB/s	1555GB/s	2039GB/s	1555GB/s	1935GB/s	2039GB/s
Max TDP	250W	300W	400W	400W	250W	300W	400W
Multi-instance GPUs	Up to 7 MIGs @5GB	Up to 7 MIGs @10GB	Up to 7 MIGs @5GB	Up to 7 MIGs @10GB	Up to 7 MIGs @5GB	Up to 7 MIGs @10GB	Up to 7 MIGs @10GB
Form factor	PCIe 双插槽风冷式单插槽液冷式		SXM		PCIe 双插槽风冷式单插槽液冷式		SXM
Interconnect	搭载 2 个 GPU 的 NVIDIA NVLink 桥接器：600GB/s PCIe 4.0 ：64GB/s		NVLink ：600GB/s PCIe 4.0 ：64GB/s		搭载 2 个 GPU 的 NVIDIA NVLink 桥接器：400GB/s PCIe 4.0 ：64GB/s		NVLink ：400GB/s PCIe 4.0 ：64GB/s

今年3月，英伟达发布了新一代基于4nm工艺，拥有800亿个晶体管、18432个核心的H100 GPU，采用CoWoS 2.5D封装技术，核心面积814平方毫米。它拥有18432个CUDA核心、576个Tensor核心、60MB二级缓存，可搭配6144-bit位宽的六颗HBM3/HBM2e，总容量80GB，支持PCIe 5.0、第四代NVLink总线。它有两种样式，其中SXM版本15872个CUDA核心、528个Tensor核心，显存带宽3.35TB/s，NVLink带宽900GB/s，PCIe 5.0带宽128GB/s，热设计功耗最高700W。PCIe 5.0版本14952个CUDA核心、456个Tensor核心，显存带宽2TB/s，NVLink带宽600GB/s，PCIe 5.0带宽128GB/s，热设计功耗300-350W。

目前尚不清楚中国特供的H800是哪种样式，猜测很可能是PCIe，那么NVLink互连带宽就只有300GB/s，PCIe 5.0则应该不会缩水。

参数/型号	H100 SXM	H100 PCIe	H100 NVL
FP64	34 teraFLOPS	26 teraFLOPS	68 teraFLOPS
FP64 Tensor Core	67 teraFLOPS	51 teraFLOPS	134 teraFLOPS
FP32	67 teraFLOPS	51 teraFLOPS	134 teraFLOPS
TF32 Tensor Core	989 teraFLOPS	756 teraFLOPS	1,979 teraFLOPS
BFLOAT16 Tensor Core	1,979 teraFLOPS	1,513 teraFLOPS	3,958 teraFLOPS
FP16 Tensor Core	1,979 teraFLOPS	1,513 teraFLOPS	3,958 teraFLOPS
FP8 Tensor Core	3,958 teraFLOPS	3,026 teraFLOPS	7,916 teraFLOPS
INT8 Tensor Core	3,958 TOPS	3,026 TOPS	7,916 TOPS
GPU memory	80GB	80GB	188GB
GPU memory bandwidth	3.35TB/s	2TB/s	7.8TB/s
Decoders	7 NVDEC 7 JPEG	7 NVDEC 7 JPEG	14 NVDEC 14 JPEG
Max TDP	Up to 700W (configurable)	300-350W (configurable)	2 x 350-400W (configurable)
Multi-instance GPUs	Up to 7 MIGs @ 10GB each	Up to 7 MIGs @ 10GB each	Up to 14 MIGs @ 12GB each
Form factor	SXM	PCIe > dual-slot > air-cooled	2x PCIe > dual-slot > air-cooled
Interconnect	NVLink: 900GB/s PCIe Gen5: 128GB/s	NVLink: 600GB/s PCIe Gen5: 128GB/s	NVLink: 600GB/s PCIe Gen5: 128GB/s

英伟达受限GPU与常规型号参数对比

参数/型号	A100 40GB PCIe	A100 80GB PCIe	A100 40GB SXM	A100 80GB SXM	A800 40GB PCIe	A800 80GB PCIe	A800 80GB SXM
FP64	9.7 TFLOPS
FP64 Tensor Core	19.5 TFLOPS
FP32	19.5 TFLOPS
TF32 Tensor Core	156 TFLOPS		312 TFLOPS		156 TFLOPS		312 TFLOPS
BFLOAT16 Tensor Core	312 TFLOPS		624 TFLOPS		312 TFLOPS		624 TFLOPS
FP16 Tensor Core	312 TFLOPS		624 TFLOPS		312 TFLOPS		624 TFLOPS
INT8 Tensor Core	624 TOPS	624 TOPS	1248 TOPS	1248 TOPS	624 TOPS	624 TOPS	1248 TOPS
GPU memory	40GB HBM2e	80GB HBM2e	40GB HBM2e	80GB HBM2e	40GB HBM2e	80GB HBM2e	80GB HBM2e
GPU memory带宽	1555GB/s	1935GB/s	1555GB/s	2039GB/s	1555GB/s	1935GB/s	2039GB/s
Max TDP	250W	300W	400W	400W	250W	300W	400W
Multi-instance GPUs	Up to 7 MIGs @5GB	Up to 7 MIGs @10GB	Up to 7 MIGs @5GB	Up to 7 MIGs @10GB	Up to 7 MIGs @5GB	Up to 7 MIGs @10GB	Up to 7 MIGs @10GB
Form factor	PCIe 双插槽风冷式单插槽液冷式		SXM		PCIe 双插槽风冷式单插槽液冷式		SXM
Interconnect	搭载 2 个 GPU 的 NVIDIA NVLink 桥接器：600GB/s PCIe 4.0 ：64GB/s		NVLink ：600GB/s PCIe 4.0 ：64GB/s		搭载 2 个 GPU 的 NVIDIA NVLink 桥接器：400GB/s PCIe 4.0 ：64GB/s		NVLink ：400GB/s PCIe 4.0 ：64GB/s

目前尚不清楚中国特供的H800是哪种样式，猜测很可能是PCIe，那么NVLink互连带宽就只有300GB/s，PCIe 5.0则应该不会缩水。

参数/型号	H100 SXM	H100 PCIe	H100 NVL
FP64	34 teraFLOPS	26 teraFLOPS	68 teraFLOPS
FP64 Tensor Core	67 teraFLOPS	51 teraFLOPS	134 teraFLOPS
FP32	67 teraFLOPS	51 teraFLOPS	134 teraFLOPS
TF32 Tensor Core	989 teraFLOPS	756 teraFLOPS	1,979 teraFLOPS
BFLOAT16 Tensor Core	1,979 teraFLOPS	1,513 teraFLOPS	3,958 teraFLOPS
FP16 Tensor Core	1,979 teraFLOPS	1,513 teraFLOPS	3,958 teraFLOPS
FP8 Tensor Core	3,958 teraFLOPS	3,026 teraFLOPS	7,916 teraFLOPS
INT8 Tensor Core	3,958 TOPS	3,026 TOPS	7,916 TOPS
GPU memory	80GB	80GB	188GB
GPU memory bandwidth	3.35TB/s	2TB/s	7.8TB/s
Decoders	7 NVDEC 7 JPEG	7 NVDEC 7 JPEG	14 NVDEC 14 JPEG
Max TDP	Up to 700W (configurable)	300-350W (configurable)	2 x 350-400W (configurable)
Multi-instance GPUs	Up to 7 MIGs @ 10GB each	Up to 7 MIGs @ 10GB each	Up to 14 MIGs @ 12GB each
Form factor	SXM	PCIe > dual-slot > air-cooled	2x PCIe > dual-slot > air-cooled
Interconnect	NVLink: 900GB/s PCIe Gen5: 128GB/s	NVLink: 600GB/s PCIe Gen5: 128GB/s	NVLink: 600GB/s PCIe Gen5: 128GB/s

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

英伟达受限GPU与常规型号参数对比

英伟达受限GPU与常规型号参数对比

活动

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

英伟达受限GPU与常规型号参数对比

英伟达受限GPU与常规型号参数对比