我和小伙伴多次遇到这个问题了,根因在fabricmanager和nvidia driver版本不一致造成的。下面教程比较简单,但是解决CUDA initialization问题非常实用啊!!!
执行:python -c "import torch; torch.cuda.is_available() "报错
UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized ...
根因:fabricmanager和nvidia driver版本不一致
解决方案:
1.执行nvidia-smi查看驱动版本
2.然后去官网找一致的nvidia-fabric-manager
在这里找:XXXXs://developer.download.nvidia.cn/compute/cuda/repos/rhel8/x86_64/
安装:yum install nvidia-fabric-manager-(换成自己的版本号)-1.x86_64.rpm
启动:
systemctl enable nvidia-fabricmanager
systemctl restart nvidia-fabricmanager
systemctl status nvidia-fabricmanager