Difference between revisions of "GPU Support"

From GridPP Wiki
Jump to: navigation, search
(Container Example Using Dirac)
 
(23 intermediate revisions by one user not shown)
Line 1: Line 1:
 +
* We have recently added some grid nodes with Nvidia GA100 GPUs at UKI-LT2-IC-HEP.
 +
* I am not sure if GPUs are available at other grid sites (other than QMUL), or have been tested or used much there.
 +
* These are very much "experimental" at the moment and their use has not been well tested. You should have a very good understanding of how your GPU code works before trying to run it on the grid.
 +
* The worker nodes only have a minimal software stack installed. Your job environment will need to provide Cuda support using something like Anaconda, or perhaps by means of a container image.
 +
* If you require support please email lcg-site-admin at imperial.ac.uk.
  
 
+
== Anaconda Example Using DIRAC ==
We have recently added some grid nodes with Nvidia GA100 GPUs.
+
Since worker nodes typically have a limited selection of installed software, jobs which use GPUs will need to prepare an environment for their use.
+
 
+
== Anaconda Example ==
+
  
 
The following example is based on the [https://www.anaconda.com/ Anaconda] python distribution and some familiarity with this is probably desirable.
 
The following example is based on the [https://www.anaconda.com/ Anaconda] python distribution and some familiarity with this is probably desirable.
Line 23: Line 24:
 
</pre>
 
</pre>
  
Our bash script is set as the executable in the JDL above and contains:
+
In our InputSandbox we have 3 scripts:
 +
 
 +
* A bash wrapper
 +
* Our python script which represents the GPU job
 +
* A freshly downloaded x86_64 installer from the [https://www.anaconda.com/ Anaconda] website - we upload this to a SE as it contains binary data and is rather large
 +
 
 +
The bash script simply installs Anaconda to our jobs scratch area, sources "conda.sh", lists the environment and then activates it.
 +
It then installs "cudatoolkit" to provide GPU support for Anaconda python packages, and the "numba" package which can make use of GPU.
 +
Finally it executes the python script:
  
 
<pre>
 
<pre>
Line 34: Line 43:
 
./gpu_test.py
 
./gpu_test.py
 
</pre>
 
</pre>
 
Here we install Anaconda using the script that we have downloaded and then referenced in our InputSandbox.
 
This is simply the default installation script you can download from the [https://www.anaconda.com/ Anaconda] website.
 
In the JDL this has been uploaded to my personal share in the gridpp section of our SE - we do this rather than uploading with the job since the Anaconda installer is quite large.
 
Once installed we can list our python environments and then activate the "base" environment.
 
We then install the cudatoolkit so we can make use of the GPU, and the "numba" package.
 
With those dependencies installed in our environment, we can then execute our python script which simply contains:
 
  
 
<pre>
 
<pre>
Line 48: Line 50:
 
</pre>
 
</pre>
  
If we submit the job and look at the last excerpt of our output:
+
If we submit the job and look at the last bit of our output:
  
 
<pre>
 
<pre>
Line 54: Line 56:
 
</pre>
 
</pre>
  
We can see that we can indeed access our GPU using python.
+
We can see that we can access our GPU using python.
 +
 
 +
 
 +
== Container Example Using Dirac ==
 +
 
 +
* Dirac provides Singularity (https://apptainer.org/) which can be used for running containers.
 +
* You can use Singularity within Dirac jobs, but you can also use it "interactively" if you have activated your Dirac user interface.
 +
* This is particularly helpful when it comes to building / testing your container before deploying jobs.
 +
* You cannot use images which require unsquashfs within the Dirac environment, use sandboxes instead.
 +
* https://cvmfs.readthedocs.io/en/latest/cpt-containers.html has some useful documentation regarding hosting sandbox containers on cvmfs.
 +
 
 +
As an example we'll be making use of the Nvidia cuda image:
 +
 
 +
https://hub.docker.com/layers/cuda/nvidia/cuda/11.0.3-devel-centos7/images/sha256-435a693db20b6a64350310147858ebbaf75734a753ed36d9ef0fa07671184f7f?context=explore
 +
 
 +
We need to make this image available on cvmfs, so we'll make use of unpacked.cern.ch:
 +
 
 +
* start by forking https://gitlab.cern.ch/unpacked/sync
 +
* add the URI of the image to the "recipe.yaml" file
 +
* make a pull request against the master branch of the repo
 +
 
 +
If the pull request is accepted then it will be unpacked to /cvmfs/unpacked.cern.ch. This can take some time but it should be there by the next working day.
 +
Additionally, if updates are pushed to the registry where the original image is hosted, then these should also show up on cvmfs.
 +
 
 +
Here is our jdl for a very simple container job:
 +
 
 +
<pre>
 +
[
 +
JobName = "gpu2_test";
 +
Executable = "gpu2_test.sh";
 +
Arguments = "";
 +
StdOutput = "StdOut";
 +
StdError = "StdErr";
 +
InputSandbox = {"gpu2_test.sh","cat_me.txt"};
 +
OutputSandbox = {"StdOut","StdErr"};
 +
Site = "LCG.UKI-LT2-IC-HEP.uk";
 +
Tags = {"GPU"}
 +
]
 +
</pre>
 +
 
 +
And here we have our payload, the gpu2_test.sh script:
 +
 
 +
<pre>
 +
#!/bin/sh
 +
echo "Our job is running on $HOSTNAME)"
 +
singularity --version
 +
echo "Our working directory is: $(pwd)"
 +
echo "Lets see if we have a GPU resource:"
 +
singularity run --nv /cvmfs/unpacked.cern.ch/registry.hub.docker.com/nvidia/cuda\:11.0.3-devel-centos7/ nvidia-smi
 +
echo "Here we will bind our working directory to /home inside the container:"
 +
singularity run --nv -H "$(pwd):/home" /cvmfs/unpacked.cern.ch/registry.hub.docker.com/nvidia/cuda\:11.0.3-devel-centos7/ cat cat_me.txt
 +
</pre>
 +
 
 +
"cat_me.txt" is a text file with the string "Hello World!", to provide an example of how you can mount your working directory at the containers "/home".

Latest revision as of 10:22, 9 August 2022

  • We have recently added some grid nodes with Nvidia GA100 GPUs at UKI-LT2-IC-HEP.
  • I am not sure if GPUs are available at other grid sites (other than QMUL), or have been tested or used much there.
  • These are very much "experimental" at the moment and their use has not been well tested. You should have a very good understanding of how your GPU code works before trying to run it on the grid.
  • The worker nodes only have a minimal software stack installed. Your job environment will need to provide Cuda support using something like Anaconda, or perhaps by means of a container image.
  • If you require support please email lcg-site-admin at imperial.ac.uk.

Anaconda Example Using DIRAC

The following example is based on the Anaconda python distribution and some familiarity with this is probably desirable. Through Anaconda we can obtain "cudatoolkit" which provides support for the GPU and "numba" which is python library that you can use to make use of the GPU.

[
JobName = "gpu_test";
Executable = "gpu_test.sh";
Arguments = "";
StdOutput = "StdOut";
StdError = "StdErr";
InputSandbox = {"gpu_test.sh","gpu_test.py","LFN:/gridpp/user/d/dan.whitehouse/Anaconda3-2022.05-Linux-x86_64.sh"};
OutputSandbox = {"StdOut","StdErr"};
Site = "LCG.UKI-LT2-IC-HEP.uk";
Tags = {"GPU"}
]

In our InputSandbox we have 3 scripts:

  • A bash wrapper
  • Our python script which represents the GPU job
  • A freshly downloaded x86_64 installer from the Anaconda website - we upload this to a SE as it contains binary data and is rather large

The bash script simply installs Anaconda to our jobs scratch area, sources "conda.sh", lists the environment and then activates it. It then installs "cudatoolkit" to provide GPU support for Anaconda python packages, and the "numba" package which can make use of GPU. Finally it executes the python script:

#!/bin/sh
./Anaconda3-2022.05-Linux-x86_64.sh -p ${PWD}/gputest -b
source ${PWD}/gputest/etc/profile.d/conda.sh
conda info -e
conda activate base
conda install cudatoolkit numba
./gpu_test.py
#!/usr/bin/env python
from numba import cuda
print(cuda.gpus)

If we submit the job and look at the last bit of our output:

<Managed Device 0>

We can see that we can access our GPU using python.


Container Example Using Dirac

  • Dirac provides Singularity (https://apptainer.org/) which can be used for running containers.
  • You can use Singularity within Dirac jobs, but you can also use it "interactively" if you have activated your Dirac user interface.
  • This is particularly helpful when it comes to building / testing your container before deploying jobs.
  • You cannot use images which require unsquashfs within the Dirac environment, use sandboxes instead.
  • https://cvmfs.readthedocs.io/en/latest/cpt-containers.html has some useful documentation regarding hosting sandbox containers on cvmfs.

As an example we'll be making use of the Nvidia cuda image:

https://hub.docker.com/layers/cuda/nvidia/cuda/11.0.3-devel-centos7/images/sha256-435a693db20b6a64350310147858ebbaf75734a753ed36d9ef0fa07671184f7f?context=explore

We need to make this image available on cvmfs, so we'll make use of unpacked.cern.ch:

If the pull request is accepted then it will be unpacked to /cvmfs/unpacked.cern.ch. This can take some time but it should be there by the next working day. Additionally, if updates are pushed to the registry where the original image is hosted, then these should also show up on cvmfs.

Here is our jdl for a very simple container job:

[
JobName = "gpu2_test";
Executable = "gpu2_test.sh";
Arguments = "";
StdOutput = "StdOut";
StdError = "StdErr";
InputSandbox = {"gpu2_test.sh","cat_me.txt"};
OutputSandbox = {"StdOut","StdErr"};
Site = "LCG.UKI-LT2-IC-HEP.uk";
Tags = {"GPU"}
]

And here we have our payload, the gpu2_test.sh script:

#!/bin/sh
echo "Our job is running on $HOSTNAME)"
singularity --version
echo "Our working directory is: $(pwd)"
echo "Lets see if we have a GPU resource:"
singularity run --nv /cvmfs/unpacked.cern.ch/registry.hub.docker.com/nvidia/cuda\:11.0.3-devel-centos7/ nvidia-smi
echo "Here we will bind our working directory to /home inside the container:"
singularity run --nv -H "$(pwd):/home" /cvmfs/unpacked.cern.ch/registry.hub.docker.com/nvidia/cuda\:11.0.3-devel-centos7/ cat cat_me.txt

"cat_me.txt" is a text file with the string "Hello World!", to provide an example of how you can mount your working directory at the containers "/home".