How to Install Tensorflow on the GPU with Docker
This tutorial will discuss setting up Tensorflow on GPUs with docker.
Introduction
The pace at which deep learning has risen is speedy and spectacular. It has led to significant innovations and several new research and training methods.
An example is the popular deep learning library used to build and construct models to find solutions to numerous tasks, i.e., Tensorflow. It is regarded as one of the best libraries which can solve almost any question related to deep learning and neural networks.
Though tensorflow performs effectively with most simpler and smaller datasets on a CPU, its real power is utilizing the Graphics Processing Unit (GPU).
GPU coupled with this deep learning framework will get you great results when we talk about performance in the tasks you are doing. But many times, installation of Tensorflow on a GPU environment is not easy because of CUDA errors that may arise.
In this blog, we’ll discuss and explore how to install Tensorflow-GPU using docker.
Dont want to set up Tensorflow with GPU support Locally?
With Saturn Cloud, you can use TensorFlow on the cloud for free with GPU support.
Why Docker
When it comes to modern state-of-art models, they are renowned for being extremely large and over-parameterized; in fact, they have many more parameters than data points in the dataset. These models depend on multiprocessing and distribution modules like torch.distributed
or tf.distribute
since they demand enormous amounts of computing to train.
Now let’s say you somehow manage to write a parallel code successfully; you still need to ensure that all of your accelerators are “visible” and your CUDA version matches what your primary library supports (dependency hell ☠️)
What is the solution to this?
By offering preset images with the best CUDA setup for each version, Docker makes this process infinitely better. To further simplify the process, you can even build open these pre-existing images and add your unique libraries and frameworks.
Installing Docker for GPU
We want to run the TensorFlow container image and take advantage of the GPUs in our system, and to do this, we need to have a particular version of docker to work with the GPU. This is because docker containers are platform and hardware agnostic, so there will be a problem when using specialized hardware such as NVIDIA GPUs as they require kernel modules and user-level libraries.
Due to this, Docker does not natively support NVIDIA GPUs within containers.
We will use Nvidia docker to enable portability in our Docker image, leveraging NVIDIA GPUs in our system.
Nvidia-docker is essentially a wrapper around the docker command that transparently provisions a container with the necessary components to execute code on the GPU.
To install docker for GPU, we will run the following commands:
curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker
To point to the specific installation files for GPU-compatible Docker, we will execute the following command:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&&curl-s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
Now, we will update Ubuntu’s repositories using the following command so that the new changes we have made will be made available.
# Updating ubuntu’s repositories
sudo apt-get update
Now, we can install the Nvidia GPU compatible docker version using the following command:
sudo apt-get install -y nvidia-docker2
We’ll need to restart Docker to ensure the installation changes take effect.
sudo systemctl restart docker
Setting Up TensorFlow With GPU Support
TensorFlow provides several images depending on your use case, such as latest, nightly, and devel, devel-gpu.
But most of the time, when working on a project, you must work with other additional libraries or packages not included in the standard TensorFlow image.
Because of this, building a custom TensorFlow image will be useful since you can augment it with other additional libraries you are working with.
Through the following steps, we can build a custom TensorFlow image with Docker:
Step 1: Creating a Dockerfile
To begin, we will need first to create a Dockerfile which defines how our custom image will be built.
Choosing the base image
Since TensorFlow causes most issues, TF, CUDA, and cuDNN versions must be compatible. See this site for the appropriate TF, cuDNN, and CUDA versions.
Most people choose a base image from a TF docker image however, when you check on this site, official Tesnorflow only supports CUDA 11.2 ( or 11.0 or 10.1), which makes it impossible to start from CUDA 11.3.
To solve that, you can choose a base image that has already installed
cuda=11.3
andcudnn=8
and then look for a way to install TensorFlow.We will use
nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04
as our base image.
Using your preferred text editor, create a new file named Dockerfile in a new directory and then add the following content.
FROM nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04
# Install additional packages
RUN apt-get -y update && \
apt-get -y upgrade && \
apt-get install -y python3-pip python3-dev
RUN apt-get install -y git
# Install any python packages you need
COPY requirements.txt requirements.txt
RUN python3 -m pip install --upgrade pip && \
python3 -m pip install -r requirements.txt
COPY . .
# alias
RUN echo 'alias [python](https://saturncloud.io/glossary/python)="python3" ' >> ~/.bashrc
RUN echo 'alias pip="pip3" ' >> ~/.bashrc
CMD tail -f /dev/null
Step 2: Building and running the Docker image
While in the same directory as our Dockerfile, we will run the following command to build the image from the Dockerfile.
# Create the image “tensorflow_image” from the file “Dockerfile”
docker build -t tensorflow_image . -f Dockerfile
After building the image, using the following command, we will create a container from that image and run it.
# create and run a container from the above image
docker run --name tensorflow_container --gpus all -w="/working" tensorflow_image bash
Then, execute the following command to enter the container:
# Enter the “tensorrflow_container”
[docker](https://saturncloud.io/glossary/docker) exec -it tensorflow_container bash
While within the container, we can check the following:
# Check if the NVIDIA Driver is recognized
nvidia-smi
# Check the version of CUDA
nvcc --version
# Check the version of cuDNN
cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
Now, we can go ahead to install Tensorflow
as highlighted in step 5 of this official tutorial.
TensorFlow requires a recent version of pip, so we will first upgrade our pip installation and install TensorFlow using pip.
pip install --upgrade pip
pip install tensorflow==2.9.1
And YES!
We have installed Tensorflow.
Now let us check if it works.
Testing our installation
To check if TensorFlow GPU has been installed properly on the machine, we will check as it is in the official tutorial step 6.
# Verify the CPU setup
python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
# A tensor should be return, something like
# tf.Tensor(-686.383, shape=(), dtype=float32)
# Verify the GPU setup
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('[GPU](https://saturncloud.io/glossary/gpu)'))"
# A list of GPU devices should be return, something like
# [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Conclusion
In this article, we have seen how you can set up Tensorflow to train deep learning models on all of your GPUs using Docker to make distributed training easier.
You may also be interested in:
- Random Forest on GPUs: 2000x Faster than Apache Spark
- Speeding up Neural Network Training With Multiple GPUs and Dask
- Multi-GPU TensorFlow on Saturn Cloud
- My First Experience Using RAPIDS
- Use R and Torch on a GPU
- Introduction to GPUs
- Top 10 GPU Computing Platforms
- Train a TensorFlow Model (Multi-GPU)
- Use RAPIDS on a GPU Cluster
- An Alternative to SageMaker
- Saturn Cloud vs Amazon Sagemaker
- Configuring SageMaker
- Using AWS SageMaker Input Modes: Amazon S3, EFS, or FSx
- How to Work with Custom S3 Buckets and AWS SageMaker
- How to Build Custom Docker Images For AWS SageMaker
- How to Work With Pycharm and AWS SageMaker Using AWS SageMaker Python SDK
- How to Set up AWS SageMaker for Multiple Users
- How to securely connect to AWS SageMaker using SSH through a Bastion Host
- Using SSH with AWS SageMaker and Ngrok
- Connect to Dask from SageMaker
- Top 10 Free Machine Learning Platforms
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.