Install CUDA 10.1 and Ubuntu 18.04 from scratch
Imagine this, you get super powerful hardware device (In my case, itโs NVIDIA GEFORCE RTX 2080 Ti ๐ช) We now plan to use it for machine learning ๐ But wait, there are so many things with different versions to install. (I personally encountered the version mismatch problems and reinstalled everything for 3๏ธโฃ timesโฆ) To save your time on struggling with the headache, in this post, we will go through these steps to ensure your GPU runs smoothly:
- Install Ubuntu 18.04 with a Bootable USB
- Install CUDA 10.1 and cudnn7
- Install Tensorflow2.2
Install Ubuntu 18.04 with a Bootable USB
As my machine is brand new, I first need to install an operating system. If you also want to install ubuntu, you can prepare an empty USB stick whose memory is larger than 2.1 MB. Later, follow this nice tutorial to create a Bootable USB:
But instead of downloading the latest ubuntu version, I choose 18.04 LTS as Iโm more familiar with it. You can find this version ๐ here.
After having a bootable USB stick, we can plug it into our new machine. As soon as we boot the computer, KEEP pressing F2 until we enter the BIOS setting:
Press F8 to see the Boot Menu and choose our bootable USB (please choose the one โ ๏ธ without UEFI โ ๏ธ) Afterwards, follow this setup guide to complete the installation:
Now we can login to our new machine โ๏ธ
Solution to loop login
Donโt panic if you encounter loop login here ๐คฆ๐ปโโ๏ธ๐คฆ๐ปโโ๏ธ Installing Nvidia driver will fix this problem. You simply have to:
- Switch to terminal mode with: Ctrl + Alt + F3. Now you can login.
- Type these commands:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices
you should see a list like:
3. Install the driver labeled with third-party free recommended
. In my case, I will need to use
sudo apt install nvidia-driver-460
sudo reboot
This time, we should be able to login with its GUI mode ๐โโ๏ธ
Install CUDA 10.1 and cudnn7
To use GPU, we need to install CUDA, a parallel computing platform and API model created by Nvidia. But please โ๏ธ DO NOT โ๏ธ go to Nvidiaโs official website! It will encourage you to install the latest CUDA version. However, it may not support tensorflow ๐ฑ (Here is a version suggested table)
I use steps from this awesome Medium post but tailored the commands for our use case.
# setup correct CUDA PPA
sudo apt update
sudo add-apt-repository ppa:graphics-driverssudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub# install cuda10.1 and cudnn7
sudo apt update
sudo apt install cuda-10-1
sudo apt install libcudnn7
Type nvcc -V
and nvidia-smi
to check whether the installation succeeds.
nvcc -V
The CUDA version on the top right corner of nvidia-smi
does not matter. (The version number may be misleading if you once installed a CUDA version but removed it and installed another one ๐คช) As long as you see the table above, you are ready to move on ๐
โ The next step is important. We have to export the library path in zsh setting to help the system find our physical GPU: vim .zshrc
Wait, why cuda-10.2 here? Well, although we installed cuda 10.1, the library libcublas.so.10 sits in /usr/local/cuda-10.2/lib64
according to this github issue ๐คท๐ปโโ๏ธ
After modifying, donโt forget to save it and source .zshrc
๐งโโ๏ธ
Install Tensorflow2.2
As we want to use python/Tensorflow to do machine learning, the version of these tools are also crucial:
sudo apt install python3.6
sudo apt install python3-pip
pip install tensorflow==2.2
The last line will install the stable tensorflow version of both CPU and GPU. If it fails with message saying failed building wheel for grpcio
, it is because the pip version is not capable to install this package. To fix it, run:
pip3 install --upgrade pip
Okey dokey. Thatโs all ๐ ๐ ๐
Next time when you are training the machine, you will see a high memory usage, meaning the GPU is running as expected ๐