GPU安全容器CUDA使用教程-天翼云开发者社区

## 名词介绍

### CUDA

**介绍**

CUDA是NVIDIA推出的一个并行计算平台和编程模型，能够使得使用GPU进行通用计算变得简单和优雅。仅适用于NVIDIA GPU设备，CUDA的本质是一个工具包（ToolKit），主流的深度学习框架也都是基于CUDA进行GPU并行加速的，几乎无一例外。

### cuDNN

**介绍**

NVIDIA cuDNN是用于深度神经网络的GPU加速库。它是一个SDK，强调性能、易用性和低内存开销，cuDNN为标准例程提供了高度调优的实现，例如前向和后向卷积、池化、规范化和激活层。NVIDIA cuDNN可以集成到更高级别的机器学习框架中，如Caffe、 Chainer、 Keras、 MATLAB、 MxNet、 PyTorch 和 TensorFlow。

### Tensorflow

**介绍**

TensorFlow是一个用于机器学习的端到端开源平台

## 架构介绍

![架构介绍](../image/gpu_architecture.png)

## 安装介绍

### 手动安装

手动安装以使用centos8镜像作为容器镜像

***step1----创建GPU安全容器实例***

* 确认挂载GPU，且GPU支持CUDA（主流NVIDIA GPU均支持）

* 确认驱动已安装（nvidia-smi正常显示GPU设备信息）

***step2----安装cuda***

* [获取安装包]

* 安装依赖库（yum -y install gcc kernel-devel make pciutils elfutils-libelf-devel which gcc-c++）

* 执行安装（sh cuda_10.2.89_440.33.01_linux.run）

* 添加cuda到PYTH（~/.bashrc中添加export PATH=$PATH:/usr/local/cuda/bin）

* 安装后验证（nvcc -V能正常显示cuda版本信息）

***step3----安装cudnn***

* [获取安装包]

下载安装包：

libcudnn8-8.0.5.39-1.cuda11.0.*.rpm

libcudnn8-devel-8.0.5.39-1.cuda11.0.*.rpm

libcudnn8-samples-8.0.5.39-1.cuda11.0.*.rpm

* 执行安装（rpm -ivh libcudnn8-*.rpm）

* 安装后验证

```shell

cp -r /usr/src/cudnn_samples_/ $HOME

cd $HOME/cudnn_samples_/mnistCUDNN

make clean && make

./mnistCUDNN

若安装成功会提示 "......Test passed!"

```

***step4----安装tensorflow***

* [安装指导]

建议直接下载tensorflow库的whl文件并通过pip install安装

***test5----机器学习代码测试***

* testcase1

```python

python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

```

* testcase2

```python

python -c "import tensorflow as tf; hello = tf.constant('hello tensorflow!'); sess = tf.Session(); print(sess.run(hello))"

```

### 镜像安装

在docker hub中，nvidia已提供有现成镜像，已经预装了cuda和cudnn。

***step1----获取镜像***

* 拉取镜像（ctr -n k8s.io i pull -k docker.io/nvidia/cuda:11.0.3-cudnn8-devel-centos8）

***step2----创建GPU安全容器实例***

* 确认挂载GPU，且GPU支持CUDA（主流NVIDIA GPU均支持）

* 确认驱动已安装（nvidia-smi正常显示GPU设备信息）

***step3----安装tensorflow***

* [安装指导]

建议直接下载tensorflow库的whl文件并通过pip install安装

***test4----机器学习代码测试***

* testcase1

```python

python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

```

* testcase2

```python

python -c "import tensorflow as tf; hello = tf.constant('hello tensorflow!'); sess = tf.Session(); print(sess.run(hello))"

```

## 注意事项

1. CUDA手动安装过程中需要去勾选nvidia driver，因为环境中已经安装好驱动，否则会导致安装失败

2. 安装指导（CUDA_Installation_Guide_Linux.pdf），安装完成后可在安装目录中获取该文件。

3. cuDNN安装包获取需要注册NVIDIA账号才可以登陆下载

## GPU机器学习测试结果

### 测试代码

```python

import tensorflow as tf

import timeit

with tf.device('/cpu:0'):

cpu_a = tf.random.normal([10000, 1000])

cpu_b = tf.random.normal([1000, 2000])

print(cpu_a.device, cpu_b.device)

with tf.device('/gpu:0'):

gpu_a = tf.random.normal([10000, 1000])

gpu_b = tf.random.normal([1000, 2000])

print(gpu_a.device, gpu_b.device)

def cpu_run():

with tf.device('/cpu:0'):

c = tf.matmul(cpu_a, cpu_b)

return c

def gpu_run():

with tf.device('/gpu:0'):

c = tf.matmul(gpu_a, gpu_b)

return c

# warm up

cpu_time = timeit.timeit(cpu_run, number=10)

gpu_time = timeit.timeit(gpu_run, number=10)

print('warmup:', cpu_time, gpu_time)

cpu_time = timeit.timeit(cpu_run, number=10)

gpu_time = timeit.timeit(gpu_run, number=10)

print('run time:', cpu_time, gpu_time)

```

### 测试结果

```

2021-02-04 11:42:27.932828: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0

2021-02-04 11:42:32.024173: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set

2021-02-04 11:42:32.032048: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1

2021-02-04 11:42:33.037884: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-02-04 11:42:33.040166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:

pciBusID: 0000:01:01.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0

coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s

2021-02-04 11:42:33.040652: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0

2021-02-04 11:42:33.057129: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11

2021-02-04 11:42:33.057419: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11

2021-02-04 11:42:33.065013: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10

2021-02-04 11:42:33.068659: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10

2021-02-04 11:42:33.080241: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10

2021-02-04 11:42:33.084766: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11

2021-02-04 11:42:33.087111: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8

2021-02-04 11:42:33.088240: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-02-04 11:42:33.090174: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-02-04 11:42:33.091650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0

2021-02-04 11:42:33.093331: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX512F

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2021-02-04 11:42:33.094124: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set

2021-02-04 11:42:33.094850: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-02-04 11:42:33.096063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:

pciBusID: 0000:01:01.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0

coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s

2021-02-04 11:42:33.096379: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0

2021-02-04 11:42:33.096598: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11

2021-02-04 11:42:33.096626: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11

2021-02-04 11:42:33.096938: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10

2021-02-04 11:42:33.097046: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10

2021-02-04 11:42:33.097130: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10

2021-02-04 11:42:33.097428: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11

2021-02-04 11:42:33.097537: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8

2021-02-04 11:42:33.098104: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-02-04 11:42:33.099745: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-02-04 11:42:33.100991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0

2021-02-04 11:42:33.101553: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0

2021-02-04 11:42:34.608321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:

2021-02-04 11:42:34.610224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0

2021-02-04 11:42:34.610387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N

2021-02-04 11:42:34.612121: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-02-04 11:42:34.613865: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-02-04 11:42:34.615322: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-02-04 11:42:34.616580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14760 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:01:01.0, compute capability: 7.0)

/job:localhost/replica:0/task:0/device:CPU:0 /job:localhost/replica:0/task:0/device:CPU:0

/job:localhost/replica:0/task:0/device:GPU:0 /job:localhost/replica:0/task:0/device:GPU:0

2021-02-04 11:42:35.481701: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 80000000 exceeds 10% of free system memory.

2021-02-04 11:42:36.350021: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 80000000 exceeds 10% of free system memory.

2021-02-04 11:42:37.147785: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 80000000 exceeds 10% of free system memory.

2021-02-04 11:42:37.953994: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 80000000 exceeds 10% of free system memory.

2021-02-04 11:42:38.779708: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 80000000 exceeds 10% of free system memory.

2021-02-04 11:42:43.717553: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11

2021-02-04 11:42:44.733065: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11

warmup: 8.235363349000181 1.0356680849999975

run time: 8.707680622000225 0.0031152729998211726

```