## 名词介绍
### CUDA
**介绍**
CUDA是NVIDIA推出的一个并行计算平台和编程模型,能够使得使用GPU进行通用计算变得简单和优雅。仅适用于NVIDIA GPU设备,CUDA的本质是一个工具包(ToolKit),主流的深度学习框架也都是基于CUDA进行GPU并行加速的,几乎无一例外。
### cuDNN
**介绍**
NVIDIA cuDNN是用于深度神经网络的GPU加速库。它是一个SDK,强调性能、易用性和低内存开销,cuDNN为标准例程提供了高度调优的实现,例如前向和后向卷积、池化、规范化和激活层。NVIDIA cuDNN可以集成到更高级别的机器学习框架中,如Caffe、 Chainer、 Keras、 MATLAB、 MxNet、 PyTorch 和 TensorFlow。
### Tensorflow
**介绍**
TensorFlow是一个用于机器学习的端到端开源平台
## 架构介绍
![架构介绍](../image/gpu_architecture.png)
## 安装介绍
### 手动安装
手动安装以使用centos8镜像作为容器镜像
***step1----创建GPU安全容器实例***
* 确认挂载GPU,且GPU支持CUDA(主流NVIDIA GPU均支持)
* 确认驱动已安装(nvidia-smi正常显示GPU设备信息)
***step2----安装cuda***
* [获取安装包]
* 安装依赖库(yum -y install gcc kernel-devel make pciutils elfutils-libelf-devel which gcc-c++)
* 执行安装(sh cuda_10.2.89_440.33.01_linux.run)
* 添加cuda到PYTH(~/.bashrc中添加export PATH=$PATH:/usr/local/cuda/bin)
* 安装后验证(nvcc -V能正常显示cuda版本信息)
***step3----安装cudnn***
* [获取安装包]
下载安装包:
libcudnn8-8.0.5.39-1.cuda11.0.*.rpm
libcudnn8-devel-8.0.5.39-1.cuda11.0.*.rpm
libcudnn8-samples-8.0.5.39-1.cuda11.0.*.rpm
* 执行安装(rpm -ivh libcudnn8-*.rpm)
* 安装后验证
```shell
cp -r /usr/src/cudnn_samples_/ $HOME
cd $HOME/cudnn_samples_/mnistCUDNN
make clean && make
./mnistCUDNN
若安装成功会提示 "......Test passed!"
```
***step4----安装tensorflow***
* [安装指导]
建议直接下载tensorflow库的whl文件并通过pip install安装
***test5----机器学习代码测试***
* testcase1
```python
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
```
* testcase2
```python
python -c "import tensorflow as tf; hello = tf.constant('hello tensorflow!'); sess = tf.Session(); print(sess.run(hello))"
```
### 镜像安装
在docker hub中,nvidia已提供有现成镜像,已经预装了cuda和cudnn。
***step1----获取镜像***
* 拉取镜像(ctr -n k8s.io i pull -k docker.io/nvidia/cuda:11.0.3-cudnn8-devel-centos8)
***step2----创建GPU安全容器实例***
* 确认挂载GPU,且GPU支持CUDA(主流NVIDIA GPU均支持)
* 确认驱动已安装(nvidia-smi正常显示GPU设备信息)
***step3----安装tensorflow***
* [安装指导]
建议直接下载tensorflow库的whl文件并通过pip install安装
***test4----机器学习代码测试***
* testcase1
```python
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
```
* testcase2
```python
python -c "import tensorflow as tf; hello = tf.constant('hello tensorflow!'); sess = tf.Session(); print(sess.run(hello))"
```
## 注意事项
1. CUDA手动安装过程中需要去勾选nvidia driver,因为环境中已经安装好驱动,否则会导致安装失败
2. 安装指导(CUDA_Installation_Guide_Linux.pdf),安装完成后可在安装目录中获取该文件。
3. cuDNN安装包获取需要注册NVIDIA账号才可以登陆下载
## GPU机器学习测试结果
### 测试代码
```python
import tensorflow as tf
import timeit
with tf.device('/cpu:0'):
cpu_a = tf.random.normal([10000, 1000])
cpu_b = tf.random.normal([1000, 2000])
print(cpu_a.device, cpu_b.device)
with tf.device('/gpu:0'):
gpu_a = tf.random.normal([10000, 1000])
gpu_b = tf.random.normal([1000, 2000])
print(gpu_a.device, gpu_b.device)
def cpu_run():
with tf.device('/cpu:0'):
c = tf.matmul(cpu_a, cpu_b)
return c
def gpu_run():
with tf.device('/gpu:0'):
c = tf.matmul(gpu_a, gpu_b)
return c
# warm up
cpu_time = timeit.timeit(cpu_run, number=10)
gpu_time = timeit.timeit(gpu_run, number=10)
print('warmup:', cpu_time, gpu_time)
cpu_time = timeit.timeit(cpu_run, number=10)
gpu_time = timeit.timeit(gpu_run, number=10)
print('run time:', cpu_time, gpu_time)
```
### 测试结果
```
2021-02-04 11:42:27.932828: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-02-04 11:42:32.024173: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-04 11:42:32.032048: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-02-04 11:42:33.037884: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-04 11:42:33.040166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:01.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-02-04 11:42:33.040652: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-02-04 11:42:33.057129: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-02-04 11:42:33.057419: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-02-04 11:42:33.065013: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-02-04 11:42:33.068659: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-02-04 11:42:33.080241: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-02-04 11:42:33.084766: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-02-04 11:42:33.087111: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-02-04 11:42:33.088240: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-04 11:42:33.090174: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-04 11:42:33.091650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-02-04 11:42:33.093331: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-04 11:42:33.094124: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-04 11:42:33.094850: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-04 11:42:33.096063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:01.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-02-04 11:42:33.096379: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-02-04 11:42:33.096598: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-02-04 11:42:33.096626: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-02-04 11:42:33.096938: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-02-04 11:42:33.097046: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-02-04 11:42:33.097130: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-02-04 11:42:33.097428: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-02-04 11:42:33.097537: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-02-04 11:42:33.098104: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-04 11:42:33.099745: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-04 11:42:33.100991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-02-04 11:42:33.101553: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-02-04 11:42:34.608321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-04 11:42:34.610224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-02-04 11:42:34.610387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-02-04 11:42:34.612121: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-04 11:42:34.613865: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-04 11:42:34.615322: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-04 11:42:34.616580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14760 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:01:01.0, compute capability: 7.0)
/job:localhost/replica:0/task:0/device:CPU:0 /job:localhost/replica:0/task:0/device:CPU:0
/job:localhost/replica:0/task:0/device:GPU:0 /job:localhost/replica:0/task:0/device:GPU:0
2021-02-04 11:42:35.481701: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 80000000 exceeds 10% of free system memory.
2021-02-04 11:42:36.350021: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 80000000 exceeds 10% of free system memory.
2021-02-04 11:42:37.147785: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 80000000 exceeds 10% of free system memory.
2021-02-04 11:42:37.953994: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 80000000 exceeds 10% of free system memory.
2021-02-04 11:42:38.779708: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 80000000 exceeds 10% of free system memory.
2021-02-04 11:42:43.717553: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-02-04 11:42:44.733065: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
warmup: 8.235363349000181 1.0356680849999975
run time: 8.707680622000225 0.0031152729998211726
```