Feature #12630

[Crunch2] GPU support

Added by Peter Amstutz about 4 years ago. Updated 10 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
08/01/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Use proposal from

https://dev.arvados.org/issues/17240

Implement:

  1. requiring GPUs in container record
  2. crunch-run interpreting the requirement and launching the docker and singularity containers with CUDA support

Subtasks

Task #18466: Review 12630-crunch-gpuResolvedPeter Amstutz

Task #18569: Testing strategy?ResolvedTom Clegg

Task #18622: Review 12630-nvidia-devicesResolvedPeter Amstutz


Related issues

Has duplicate Arvados - Story #12189: Support generic resource requests for run time constraintsClosed08/28/2017

Has duplicate Arvados - Feature #18322: Enable GPU support when launching container with Docker or singularityResolved

Blocks Arvados Epics - Story #15957: GPU supportIn Progress10/01/202101/31/2022

History

#1 Updated by Peter Amstutz about 4 years ago

  • Description updated (diff)

#2 Updated by Peter Amstutz about 4 years ago

  • Description updated (diff)

#3 Updated by Peter Amstutz about 4 years ago

  • Has duplicate Story #12189: Support generic resource requests for run time constraints added

#4 Updated by Peter Amstutz about 4 years ago

  • Tracker changed from Bug to Feature

#5 Updated by Peter Amstutz about 4 years ago

  • Description updated (diff)

#7 Updated by Tom Morris about 4 years ago

  • Target version changed from Arvados Future Sprints to To Be Groomed
  • Parent task set to #12518

#8 Updated by Tom Morris about 4 years ago

  • Parent task deleted (#12518)

#9 Updated by Tom Morris over 3 years ago

  • Status changed from New to Closed
  • Start date set to 08/01/2018
  • Remaining (hours) set to 0.0

#10 Updated by Peter Amstutz about 2 years ago

  • Target version changed from 2017-12-20 Sprint to To Be Groomed
  • Status changed from Closed to New
  • Tracker changed from Task to Feature

#11 Updated by Peter Amstutz about 2 years ago

#12 Updated by Peter Amstutz 7 months ago

  • Target version deleted (To Be Groomed)

#13 Updated by Peter Amstutz 4 months ago

  • Target version set to 2021-10-27 sprint
  • Description updated (diff)

#14 Updated by Peter Amstutz 4 months ago

  • Target version changed from 2021-10-27 sprint to 2021-11-10 sprint

#15 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2021-11-10 sprint to 2021-11-24 sprint

#16 Updated by Peter Amstutz 3 months ago

  • Release set to 46

#17 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2021-11-24 sprint to 2021-12-08 sprint

#18 Updated by Peter Amstutz 2 months ago

  • Target version changed from 2021-12-08 sprint to 2021-11-24 sprint

#19 Updated by Peter Amstutz 2 months ago

  • Assigned To set to Peter Amstutz

#20 Updated by Peter Amstutz 2 months ago

  • Status changed from New to In Progress

#21 Updated by Peter Amstutz 2 months ago

12630-crunch-gpu @ 458dae934c14435c0c86b90321ac8000498cab23

  • Adds CUDA fields to RuntimeConstraints
  • If CUDADeviceCount is non-zero, the underlying container driver will request GPU support

I don't know how to write a test for this, because if the system that doesn't have nvidia hardware (such as my laptop), launching the container fails. Possibly we could make a test that runs conditionally on the presence of /dev/nvidia0. That would require special support in jenkins to launch a GPU node just for this specific test suite.

#22 Updated by Lucas Di Pentima 2 months ago

  • There's a typo at sdk/go/arvados/container.go:105: CUDAPTXHardwardCapability ...
  • Should we add some documentation about the runtime constraints at this point?
  • Re: writing a test for this, do you think it would be possible to do something like a unit test confirming that docker & singularity are called with the correct parameters when the runtime constraints are present?

#23 Updated by Peter Amstutz 2 months ago

  • Target version changed from 2021-11-24 sprint to 2021-12-08 sprint

#24 Updated by Peter Amstutz 2 months ago

  • Target version deleted (2021-12-08 sprint)

12630-crunch-gpu @ 24ccf7a58d10a2a3b77a1ef6f808c1ad422e8b35

https://ci.arvados.org/view/Developer/job/developer-run-tests/2815/

I went and re-read the CUDA compatibility page, which has changed a bit since I first looked at it. In particular it doesn't distinguish between cubins (precompiled code) vs PTX (intermediate language, basically GPU assembly) any more, which leads me to think that it isn't a useful distinction for us to make either. So I simplified it to just "CUDAHardwareCapability".

Lucas Di Pentima wrote:

  • There's a typo at sdk/go/arvados/container.go:105: CUDAPTXHardwardCapability ...

Fixed, thanks

  • Should we add some documentation about the runtime constraints at this point?

Added to documentation.

  • Re: writing a test for this, do you think it would be possible to do something like a unit test confirming that docker & singularity are called with the correct parameters when the runtime constraints are present?

For Docker, we don't call the command line, we use the API. It's awkward because the crunchrun tests use a mocked container runner that doesn't interact with Docker, and the Docker tests run against real actual Docker. Testing would require mocking the Docker service in order to check that certain fields were set.

Singularity is called on the command line, but also would have to be mocked out to be able to intercept the call and check that the right command line parameter was passed.

So it's kind of a large refactoring lift to make this amenable to unit testing.

On the other hand, I could look at writing a test that runs some simple check for a working GPU, inside the actual Docker/singularity container. It just requires that the test be run on a node with a GPU. (Slightly tricky because my development laptop doesn't have a nvidia GPU). I'll look into it.

#25 Updated by Lucas Di Pentima 2 months ago

The updates LGTM, but I still think mocking docker & singularity to do unit testing is the more appropriate way to go, unless it's prohibitively tricky. Not being able to run the entire tests locally isn't ideal IMO.

#26 Updated by Peter Amstutz about 2 months ago

  • Related to Feature #18322: Enable GPU support when launching container with Docker or singularity added

#27 Updated by Peter Amstutz about 2 months ago

  • Target version set to 2022-01-05 sprint

#28 Updated by Tom Clegg about 2 months ago

12630-crunch-gpu @ 58ea9370fa7b38382dfa9eea4c42a616e0a699f3 -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2842/
  • split "build singularity command line" part of Start() (which is most of it) into a separate func so it's testable, and add a test
  • split "build docker container config" part of Create() (which is most of it) into a separate func so it's testable, and add a test
  • add checks for CUDA fields to the stub-executor tests (i.e., check translation from runtime constraints to containerSpec)

#29 Updated by Lucas Di Pentima about 2 months ago

I think this is great, thank you! Looks good to merge.

#30 Updated by Ward Vandewege about 2 months ago

Compute image needs, see:

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=11&target_type=deb_network

Verify that that works with the `nvidia-smi` command. Hmm, not even sure that that is needed. Do we need the CUDA stuff outside the container?

This page says https://phoenixnap.com/kb/nvidia-drivers-debian to add non-free to the packages list, then `apt install nvidia-detect`, run `nvidia-detect` to see which package you need (ugh), the example they give is `nvidia-driver` (maybe that works with all the cloud GPUs?

#31 Updated by Peter Amstutz about 1 month ago

root@ip-10-254-0-57:~# singularity exec --nv docker://nvidia/cuda:11.0-base nvidia-smi
INFO:    Using cached SIF image
Wed Dec 15 22:34:55 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   27C    P8     8W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

#32 Updated by Peter Amstutz about 1 month ago

root@ip-10-254-0-57:~# docker run --rm --gpus 1 nvidia/cuda:11.0-base nvidia-smi
Wed Dec 15 22:36:40 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   27C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

#33 Updated by Peter Amstutz about 1 month ago

  • Related to deleted (Feature #18322: Enable GPU support when launching container with Docker or singularity)

#34 Updated by Peter Amstutz about 1 month ago

  • Has duplicate Feature #18322: Enable GPU support when launching container with Docker or singularity added

#35 Updated by Peter Amstutz about 1 month ago

  • Target version changed from 2022-01-05 sprint to 2022-01-19 sprint

#36 Updated by Peter Amstutz 21 days ago

To make sure /dev/nvidia-uvm is created:

nvidia-modprobe -c 0 -u

#37 Updated by Peter Amstutz 21 days ago

When the compute node starts and we try to run Singularity, /dev/nvidia-uvm doesn't exist, so the CUDA library won't initialized or run. The other nvidia device files and libraries are present, it seems to just be this one.

Empirically, nvidia-persistenced creates the other devices, just not /dev/nvidia-uvm for whatever reason. Here's what the nvidia-persistenced says about it:

The daemon indirectly utilizes
.B nvidia\-modprobe
via the nvidia-cfg library to load the NVIDIA kernel module and create the NVIDIA character device files after the daemon has dropped its root privileges, if it had any to begin with.

(note: the stated purpose of nvidia-persistenced is to hold open certain kernel resources so they are not automatically released when no longer in use, to avoid overhead of a teardown/setup cycle when invoking GPU programs sequentially, but if someone did want those resources to be released, they wouldn't use nvidia-persistenced).

It seems like Docker uses nvidia-container-cli, which ensures /dev/nvidia-uvm is available itself, but Singularity does not (or at least not the same way Docker does it). However I don't think we can invoke nvidia-container-cli ourselves, because it does it as part of a complex command that does other unrelated stuff.

We can ensure that all the modules are loaded and are devices created by invoking the suid program nvidia-modprobe (at the bottom, this is what all the other libraries/applications end up doing). These seem to be idempotent operations.

It seems like we have a few options:

  1. crunch-run assumes everything is correctly configured. we fix the compute node boot scripts to run nvidia-modprobe -c0 -u sometime after nvidia-persistenced has created the other /dev/nvidia* devices and/or use the script https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications
  2. crunch-run assumes nvidia-persistenced set up everything except /dev/nvidia-uvm and crunch-run runs nvidia-modprobe -c0 -u itself
  3. crunch-run assumes nothing is set up (nvidia-persistenced may not be running), calls nvidia-smi to get a list of devices, and then calls nvidia-modprobe to ensure that all the modules and devices are created.
  4. run some other CUDA program that does the right thing, or try to dlopen libnvidia-cfg (which seems to be one of the proprietary libraries that isn't well documented) and see if there's a function we can call to do the right thing (this is a higher-level version of option 3)

My feeling is that if we do option 1, it'll work for us but we'll end up fielding support requests from customers who are not using our exact compute node startup scripts.

Option 2 fixes the immediate problem but adds an embedded assumption that nvidia-persistenced or something else has done the other setup for us.

Option 3 adds more nvidia-specific complexity to crunch-run, but embeds the fewer assumptions about the host environment (== fewer support requests)

Option 4 adds different nvidia-specific complexity to crunch-run. trying to load and interact with a proprietary C .so seems not ideal.

#38 Updated by Peter Amstutz 21 days ago

Another detail I just found out, unified memory was introduced as a feature of CUDA 6. Not setting up the unified memory module by default might be a backwards compatibility behavior.

#39 Updated by Peter Amstutz 21 days ago

Another tidbit, nvlink is an interconnect between GPUs (up to 4 fully connected GPUs).

nvswitch is a switching fabric that sets up nvlinks to up to (6 or 12) other GPUs attached to the switch.

It allows for high-bandwith (300 or 600 GB/s) data movement between GPUs.

https://www.nvidia.com/en-us/data-center/nvlink/

https://docs.nvidia.com/datacenter/tesla/pdf/fabric-manager-user-guide.pdf

In order to support nvlink, we need to run "nvidia-fabricmanager" ?

Unified memory means being able to read from system RAM or from the VRAM of other GPUs. (There seems to be a paging system to automatically copy pages between main RAM and VRAM).

It's unclear if all this is necessary if you have multiple GPUs are running independently. I don't know if the multi-GPU workloads we will need to support are mostly independent, or have will have multiple GPUs working together.

#40 Updated by Peter Amstutz 21 days ago

12630-nvidia-devices @ c595d3cd2d9f117bc09cf66762d3698c95aebf86

  • call nvidia-modprobe to make sure the drivers & devices files are all set up
  • propagate CUDA_VISIBLE_DEVICES to container
  • add "utility" to Docker DeviceRequest capability so that it injects the command line utilities

https://ci.arvados.org/view/Developer/job/developer-run-tests/2872/

#41 Updated by Tom Clegg 18 days ago

In cuda.go, we're not checking (or reporting) exit codes (which might be fine if it's best-effort), or any stdout/stderr from the modprobe commands (which could be really annoying when the best-effort doesn't work). Could be done with a "for _, cmd := range []*exec.Command{ ... } { ... }" to avoid repetition since nothing else is going on here.

In cuda.go, I don't think this is an appropriate way to ask for support money. Maybe this limitation should be mentioned in the docs? Or is there an easy way to log something when this is potentially an issue?

       // [...] If someone
       // runs Arvados on a system with multiple nvswitches
       // (i.e. more than 16 GPUs) they can either ensure that the
       // additional /dev/nvidia-nvswitch* devices exist before
       // crunch-run starts or pay for support (because they clearly
       // have the budget for it).

In docker.go, more readable to use os.Getenv("CUDA_VISIBLE_DEVICES") here... also, need to avoid panic if the env var does not contain "="

               for _, s := range os.Environ() {
                       // If a resource manager such as slurm or LSF told
                       // us to select specific devices we need to propagate that.
                       if strings.HasPrefix(s, "CUDA_VISIBLE_DEVICES=") {
                               deviceIds = strings.Split(strings.SplitN(s, "=", 2)[1], ",")

Also use os.Getenv in singularity.go

       for _, s := range os.Environ() {
               if strings.HasPrefix(s, "CUDA_VISIBLE_DEVICES=") {

#42 Updated by Peter Amstutz 18 days ago

Tom Clegg wrote:

In cuda.go, we're not checking (or reporting) exit codes (which might be fine if it's best-effort), or any stdout/stderr from the modprobe commands (which could be really annoying when the best-effort doesn't work).

Right, if it fails there's nothing we can really do about it. If everything was already set up, then it could work anyway. Exit errors will be reported as a warning.

In cuda.go, I don't think this is an appropriate way to ask for support money. Maybe this limitation should be mentioned in the docs? Or is there an easy way to log something when this is potentially an issue?

Snarky comment removed. I'll make sure to document it. I still need to write a "GPU support in crunch" install page (will be a follow-on story).

[...]

In docker.go, more readable to use os.Getenv("CUDA_VISIBLE_DEVICES") here... also, need to avoid panic if the env var does not contain "="

[...]

Also use os.Getenv in singularity.go

[...]

Why do it the easy way when the hard way is twice as complicated? Yea I overlooked os.Getenv(). Fixed.

12630-nvidia-devices @ 5e06ca0b451f36be33396f8e83bdaa4f9d6f74bb

https://ci.arvados.org/view/Developer/job/developer-run-tests/2875/

#43 Updated by Tom Clegg 18 days ago

Might be even better to log the actual command that failed to avoid digging through source code and/or guessing which of the modprobes failed (Args[0] is the command name so Printf("warning: %s: %s", nvmodprobe.Args, err) would look like "warning: [nvidia-modprobe -s]: exit status 1" which isn't too bad)

LGTM, thanks

#44 Updated by Peter Amstutz 10 days ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF