目录
- 安装Torch-cuda10.0
- 写在前面
- 本文时间
- 环境
- 显卡驱动
- cuda
- cudnn
- conda下安装tensorflow-gpu
- 测试
- 参考
安装Torch-cuda10.0
python3.7
# CUDA 10.0
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0
写在前面
本文是针对Ubuntu的,windows请查看:win10安装tensorflow-gpu1.11+cuda9+cudnn7
本文时间
2020-6-28
环境
y7000+ubuntu16+1050ti
显卡驱动
cuda
sudo sh cuda*
注意
第一个是显卡驱动,不安装!!!
第一个是显卡驱动,不安装!!!
第一个是显卡驱动,不安装!!!
安装结果如下
设置CUDA的环境变量
在zshrc中编辑如下:
#####CUDA10.0#######2020-6-28
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/lib64
export PATH=$PATH:/usr/local/cuda-10.0/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-10.0
输入nvcc -V发现打印如下,则成功安装cuda
➜ ubuntucuda10.0 cat /usr/local/cuda-10.0/version.txt
CUDA Version 10.0.130
➜ ubuntucuda10.0 sudo vim ~/.zshrc
➜ ubuntucuda10.0 source ~/.zshrc
➜ ubuntucuda10.0 nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
cudnn
安装这3个包
➜ ubuntucuda10.0 sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.0_amd64.deb
正在选中未选择的软件包 libcudnn7。
(正在读取数据库 ... 系统当前共安装有 369138 个文件和目录。)
正准备解包 libcudnn7_7.6.5.32-1+cuda10.0_amd64.deb ...
正在解包 libcudnn7 (7.6.5.32-1+cuda10.0) ...
正在设置 libcudnn7 (7.6.5.32-1+cuda10.0) ...
正在处理用于 libc-bin (2.23-0ubuntu11) 的触发器 ...
➜ ubuntucuda10.0 sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.0_amd64.deb
正在选中未选择的软件包 libcudnn7-dev。
(正在读取数据库 ... 系统当前共安装有 369144 个文件和目录。)
正准备解包 libcudnn7-dev_7.6.5.32-1+cuda10.0_amd64.deb ...
正在解包 libcudnn7-dev (7.6.5.32-1+cuda10.0) ...
正在设置 libcudnn7-dev (7.6.5.32-1+cuda10.0) ...
update-alternatives: 使用 /usr/include/x86_64-linux-gnu/cudnn_v7.h 来在自动模式中提供 /usr/include/cudnn.h (libcudnn)
➜ ubuntucuda10.0 sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.0_amd64.deb
正在选中未选择的软件包 libcudnn7-doc。
(正在读取数据库 ... 系统当前共安装有 369150 个文件和目录。)
正准备解包 libcudnn7-doc_7.6.5.32-1+cuda10.0_amd64.deb ...
正在解包 libcudnn7-doc (7.6.5.32-1+cuda10.0) ...
正在设置 libcudnn7-doc (7.6.5.32-1+cuda10.0) ...
验证cudnn是否正确安装
命令:
cd /usr/src/cudnn_samples_v7/mnistCUDNN
sudo make clean
sudo make(出错了,提示没有安装g++,那就安装一下,这里大家遇到的问题可能都不太一样,就是看他缺啥,咱就补啥就行)
//卸载g++:
sudo apt-get remove g++
//重装:
sudo apt-get install g++
./mnistCUDNN
结果如下:
➜ ubuntucuda10.0 cd /usr/src/cudnn_samples_v7/mnistCUDNN
➜ mnistCUDNN sudo make clean
rm -rf *o
rm -rf mnistCUDNN
➜ mnistCUDNN sudo make
Linking agains cublasLt = false
CUDA VERSION: 10000
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 30 35 50 53 60 61 62 70 72 75
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm
➜ mnistCUDNN ./mnistCUDNN
cudnnGetVersion() : 7605 , CUDNN_VERSION from cudnn.h : 7605 (7.6.5)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 6 Capabilities 6.1, SmClock 1620.0 Mhz, MemSize (Mb) 4040, MemClock 3504.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.016384 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.026176 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.031232 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.091136 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.167936 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.014336 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.026272 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.034688 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.081920 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.179200 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
➜ mnistCUDNN ls
data fp16_dev.h fp16_emu.h gemv.h mnistCUDNN.cpp
error_util.h fp16_dev.o fp16_emu.o Makefile mnistCUDNN.o
fp16_dev.cu fp16_emu.cpp FreeImage mnistCUDNN readme.txt
看到Test passed!则证明安装成功了~
conda下安装tensorflow-gpu
新建一个conda环境
安装tensorflow-gpu-1.13
pip install tensorflow-gpu==1.13.1
安装结果如下: