Install CUDA 10.1 and Ubuntu 18.04 from scratch

Wendee ๐Ÿ’œ๐Ÿ•
4 min readFeb 6, 2021

--

Imagine this, you get super powerful hardware device (In my case, itโ€™s NVIDIA GEFORCE RTX 2080 Ti ๐Ÿ’ช) We now plan to use it for machine learning ๐Ÿ”… But wait, there are so many things with different versions to install. (I personally encountered the version mismatch problems and reinstalled everything for 3๏ธโƒฃ timesโ€ฆ) To save your time on struggling with the headache, in this post, we will go through these steps to ensure your GPU runs smoothly:

  1. Install Ubuntu 18.04 with a Bootable USB
  2. Install CUDA 10.1 and cudnn7
  3. Install Tensorflow2.2

Install Ubuntu 18.04 with a Bootable USB

As my machine is brand new, I first need to install an operating system. If you also want to install ubuntu, you can prepare an empty USB stick whose memory is larger than 2.1 MB. Later, follow this nice tutorial to create a Bootable USB:

But instead of downloading the latest ubuntu version, I choose 18.04 LTS as Iโ€™m more familiar with it. You can find this version ๐Ÿ‘‰ here.

After having a bootable USB stick, we can plug it into our new machine. As soon as we boot the computer, KEEP pressing F2 until we enter the BIOS setting:

Press F8 to see the Boot Menu and choose our bootable USB (please choose the one โš ๏ธ without UEFI โš ๏ธ) Afterwards, follow this setup guide to complete the installation:

Now we can login to our new machine โ˜•๏ธ

Solution to loop login

Donโ€™t panic if you encounter loop login here ๐Ÿคฆ๐Ÿปโ€โ™€๏ธ๐Ÿคฆ๐Ÿปโ€โ™€๏ธ Installing Nvidia driver will fix this problem. You simply have to:

  1. Switch to terminal mode with: Ctrl + Alt + F3. Now you can login.
  2. Type these commands:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices

you should see a list like:

3. Install the driver labeled with third-party free recommended. In my case, I will need to use

sudo apt install nvidia-driver-460
sudo reboot

This time, we should be able to login with its GUI mode ๐Ÿ™†โ€โ™€๏ธ

Install CUDA 10.1 and cudnn7

To use GPU, we need to install CUDA, a parallel computing platform and API model created by Nvidia. But please โ›”๏ธ DO NOT โ›”๏ธ go to Nvidiaโ€™s official website! It will encourage you to install the latest CUDA version. However, it may not support tensorflow ๐Ÿ˜ฑ (Here is a version suggested table)

I use steps from this awesome Medium post but tailored the commands for our use case.

# setup correct CUDA PPA
sudo apt update
sudo add-apt-repository ppa:graphics-driverssudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
# install cuda10.1 and cudnn7
sudo apt update
sudo apt install cuda-10-1
sudo apt install libcudnn7

Type nvcc -V and nvidia-smi to check whether the installation succeeds.

nvcc -V
nvidia-smi

The CUDA version on the top right corner of nvidia-smi does not matter. (The version number may be misleading if you once installed a CUDA version but removed it and installed another one ๐Ÿคช) As long as you see the table above, you are ready to move on ๐Ÿ™ƒ

โ›‘ The next step is important. We have to export the library path in zsh setting to help the system find our physical GPU: vim .zshrc

Wait, why cuda-10.2 here? Well, although we installed cuda 10.1, the library libcublas.so.10 sits in /usr/local/cuda-10.2/lib64 according to this github issue ๐Ÿคท๐Ÿปโ€โ™€๏ธ

After modifying, donโ€™t forget to save it and source .zshrc ๐Ÿงšโ€โ™€๏ธ

Install Tensorflow2.2

As we want to use python/Tensorflow to do machine learning, the version of these tools are also crucial:

sudo apt install python3.6
sudo apt install python3-pip
pip install tensorflow==2.2

The last line will install the stable tensorflow version of both CPU and GPU. If it fails with message saying failed building wheel for grpcio, it is because the pip version is not capable to install this package. To fix it, run:

pip3 install --upgrade pip

Okey dokey. Thatโ€™s all ๐ŸŽŠ ๐ŸŽŠ ๐ŸŽŠ

Next time when you are training the machine, you will see a high memory usage, meaning the GPU is running as expected ๐Ÿ‘

--

--

No responses yet