Project

General

Profile

Actions

Bug #22612

closed

CUDA install doesn't really work because headers aren't available

Added by Brett Smith about 1 month ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Target version:
Story points:
-
Release relationship:
Auto

Description

The Ansible playbook to install CUDA "succeeds" but doesn't really work because this happens:

Setting up nvidia-kernel-open-dkms (560.35.05-1) ...
Loading new nvidia-current-560.35.05 DKMS files...
Building for 5.10.0-33-cloud-amd64
Module build for kernel 5.10.0-33-cloud-amd64 was skipped since the
kernel headers for this kernel does not seem to be installed.

We need to install the headers for the right kernel version. The ROCm playbook already has a recipe for this. But, I'm realizing that recipe can be buggy if the dist-upgrade early in the playbook upgrades the kernel, so, this is going to become a whole thing.


Subtasks 1 (0 open1 closed)

Task #22619: Review 22612-driver-bugfixesResolvedBrett Smith02/27/2025Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Support #22562: Test running CUDA tordo with updated pinsResolvedBrett SmithActions
Actions #1

Updated by Peter Amstutz about 1 month ago

  • Release set to 75
Actions #2

Updated by Peter Amstutz about 1 month ago

  • Related to Support #22562: Test running CUDA tordo with updated pins added
Actions #3

Updated by Brett Smith about 1 month ago

22612-driver-bugfixes @ 1ab01b755c9c5dfed2576e52f36cb594c4930daf

Image build packer-build-compute-image: #316

Workflow run tordo-xvhdp-kzwf8sdjmevkeht

This is definitely progress, it now passes the boot probe and I've confirmed by hand the nvidia modules are properly loaded. However, crunch-run doesn't really get off the ground, a-d-c logs failure just a second or two after logging the start. I'm waiting for it to finish retrying so I can see the actual crunch-run logs.

Actions #4

Updated by Brett Smith about 1 month ago

Second workflow attempt: tordo-xvhdp-7pa6mozk8o2pri2

I didn't change anything for this run. I expected it to go into the same loop the first one did, and I was going to capture the journal on the compute node per Tom's suggestion. But this one seems to be working, or at least it got far enough to actually start the process. Not sure what to make of that.

Actions #5

Updated by Brett Smith about 1 month ago

It looks like GPU acceleration is not working, but I suspect that's a problem with the workflow Docker image, not anything on the Crunch side. Early on Crunch logs the detected GPU:

2025-02-26T15:37:42.741549843Z Loading Docker image from keep
2025-02-26T15:43:15.571089491Z loaded image: response {"stream":"Loaded image ID: sha256:c9b02615cbebb3f80fe4b2725ba4b20d2b15b1de7c2807c8f2ff814e59a9b560\n"}
2025-02-26T15:43:22.893066350Z GPU 0: Tesla T4 (UUID: GPU-b8421f19-ab37-6f52-6004-e50f470a7613)
2025-02-26T15:43:23.277154589Z Creating Docker container

Once the container starts, the process logs that it can't find a compiler for either ROCm or CUDA, so it has to fall back on using the CPU:

2025-02-26T15:43:40.115070507Z import_cuda_impl: initializing gpu module...
2025-02-26T15:43:40.115174868Z extracting /zip/llama.cpp/ggml.h to /var/spool/cwl/.llamafile/v/0.8.17/ggml.h
2025-02-26T15:43:40.115768609Z extracting /zip/llamafile/compcap.cu to /var/spool/cwl/.llamafile/v/0.8.17/compcap.cu
2025-02-26T15:43:40.115853003Z extracting /zip/llamafile/llamafile.h to /var/spool/cwl/.llamafile/v/0.8.17/llamafile.h
2025-02-26T15:43:40.115907676Z extracting /zip/llamafile/tinyblas.h to /var/spool/cwl/.llamafile/v/0.8.17/tinyblas.h
2025-02-26T15:43:40.116017807Z extracting /zip/llamafile/tinyblas.cu to /var/spool/cwl/.llamafile/v/0.8.17/tinyblas.cu
2025-02-26T15:43:40.116329141Z extracting /zip/llama.cpp/ggml-impl.h to /var/spool/cwl/.llamafile/v/0.8.17/ggml-impl.h
2025-02-26T15:43:40.116512311Z extracting /zip/llama.cpp/ggml-cuda.h to /var/spool/cwl/.llamafile/v/0.8.17/ggml-cuda.h
2025-02-26T15:43:40.116580900Z extracting /zip/llama.cpp/ggml-alloc.h to /var/spool/cwl/.llamafile/v/0.8.17/ggml-alloc.h
2025-02-26T15:43:40.116668292Z extracting /zip/llama.cpp/ggml-common.h to /var/spool/cwl/.llamafile/v/0.8.17/ggml-common.h
2025-02-26T15:43:40.117507294Z extracting /zip/llama.cpp/ggml-backend.h to /var/spool/cwl/.llamafile/v/0.8.17/ggml-backend.h
2025-02-26T15:43:40.117643463Z extracting /zip/llama.cpp/ggml-backend-impl.h to /var/spool/cwl/.llamafile/v/0.8.17/ggml-backend-impl.h
2025-02-26T15:43:40.117760675Z extracting /zip/llama.cpp/ggml-cuda.cu to /var/spool/cwl/.llamafile/v/0.8.17/ggml-cuda.cu
2025-02-26T15:43:40.121579849Z extract_cuda_dso: note: prebuilt binary /zip/ggml-rocm.so not found
2025-02-26T15:43:40.121687494Z get_rocm_bin_path: note: hipInfo not found on $PATH
2025-02-26T15:43:40.121691865Z get_rocm_bin_path: note: $HIP_PATH/bin/hipInfo does not exist
2025-02-26T15:43:40.121693435Z get_rocm_bin_path: note: /opt/rocm/bin/hipInfo does not exist
2025-02-26T15:43:40.121694740Z llamafile_log_command: /opt/rocm/bin/rocminfo
2025-02-26T15:43:40.165754079Z get_amd_offload_arch_flag: error: hipInfo returned non-zero exit status
2025-02-26T15:43:40.165760710Z llamafile_log_command: hipcc -O3 -fPIC -shared --offload-arch=native -march=native -mtune=native -DGGML_USE_HIPBLAS -Wno-return-type -Wno-unused-result -Wno-unused-function -Wno-expansion-to-defined -DIGNORE0 -DNDEBUG -DGGML_BUILD=1 -DGGML_SHARED=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_MINIMIZE_CODE_SIZE -o /var/spool/cwl/.llamafile/v/0.8.17/ggml-rocm.so.gw86vi /var/spool/cwl/.llamafile/v/0.8.17/ggml-cuda.cu -lhipblas -lrocblas
2025-02-26T15:43:40.980468967Z clang++: error: cannot determine amdgcn architecture: /opt/rocm-6.2.4/lib/llvm/bin/amdgpu-arch: ; consider passing it via '--offload-arch'
2025-02-26T15:43:41.003079251Z failed to execute:/opt/rocm-6.2.4/lib/llvm/bin/clang++ --offload-arch=native --driver-mode=g++ --hip-link -O3 -fPIC -shared -march=native -mtune=native -DGGML_USE_HIPBLAS -Wno-return-type -Wno-unused-result -Wno-unused-function -Wno-expansion-to-defined -DIGNORE0 -DNDEBUG -DGGML_BUILD=1 -DGGML_SHARED=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_MINIMIZE_CODE_SIZE -o "/var/spool/cwl/.llamafile/v/0.8.17/ggml-rocm.so.gw86vi" -x hip /var/spool/cwl/.llamafile/v/0.8.17/ggml-cuda.cu -lhipblas -lrocblas
2025-02-26T15:43:41.003977449Z Compile: warning: hipcc returned nonzero exit status
2025-02-26T15:43:41.003983386Z extract_cuda_dso: note: prebuilt binary /zip/ggml-rocm.so not found
2025-02-26T15:43:41.004040913Z get_nvcc_path: note: nvcc not found on $PATH
2025-02-26T15:43:41.004047085Z get_nvcc_path: note: $CUDA_PATH/bin/nvcc does not exist
2025-02-26T15:43:41.004048855Z get_nvcc_path: note: /opt/cuda/bin/nvcc does not exist
2025-02-26T15:43:41.004139644Z get_nvcc_path: note: /usr/local/cuda/bin/nvcc does not exist
2025-02-26T15:43:41.004142037Z extract_cuda_dso: note: prebuilt binary /zip/ggml-cuda.so not found
2025-02-26T15:43:41.004143266Z extract_cuda_dso: note: prebuilt binary /zip/ggml-cuda.so not found
2025-02-26T15:43:41.004165412Z warning: --n-gpu-layers 999 was passed but no GPUs were found; falling back to CPU inference

For what it's worth, the compute node does have cuda-nvcc-12-6 installed, and that provides nvcc. Is Crunch supposed to make this available inside the container somehow?

admin@ip-10-253-254-30:~$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0
Actions #6

Updated by Brett Smith about 1 month ago

With Peter out, I built my own Docker image and used that for testing. It definitely ran an LLM, and it did it very fast, which I think shows the GPU working since the CPU-only version ran for >24 hours with no result. The logs also show CUDA support being loaded in:

2025-02-27T18:01:48.120914017Z ggml_cuda_link: welcome to CUDA SDK with tinyBLAS
2025-02-27T18:01:49.398009705Z link_cuda_dso: GPU support loaded
2025-02-27T18:01:49.398019710Z Log start

The "falling back to CPU inference" warning is gone too.

Workflow: tordo-xvhdp-rhe2qf7m2uxqz5v
Dockerfile I used: tordo-4zz18-cp1u4ld979zd9qy

Actions #7

Updated by Brett Smith about 1 month ago

  • Subtask #22619 added
Actions #8

Updated by Lucas Di Pentima about 1 month ago

Very cool the ansible.builtin.reboot function!

Do you think we might need to give it an explicit (configurable?) timeout value for those days the cloud weather is stormy and instances take some time to reboot? Docs say that if not defined, it takes the underlying connection timeout value, so I'm not so sure if we'll get any benefit in setting a different one for this, but wanted to mention it in case is a good idea.

Apart from that, LGTM.

Actions #9

Updated by Brett Smith about 1 month ago

Lucas Di Pentima wrote in #note-8:

Do you think we might need to give it an explicit (configurable?) timeout value for those days the cloud weather is stormy and instances take some time to reboot? Docs say that if not defined, it takes the underlying connection timeout value, so I'm not so sure if we'll get any benefit in setting a different one for this, but wanted to mention it in case is a good idea.

The connection timeout is itself configurable. If there's bad cloud weather, we probably want to set that so we get the benefit for both the initial connection and the post-reboot one. We could do that in the Jenkins image, or the Jenkins script, or our Packer template.

I would want to see the problem actually arise before trying to figure out which option is best for dealing with it. But at any rate, since I don't think it should be in our playbook at least, I'm going to go ahead and merge this branch as-is. Thanks.

Actions #10

Updated by Brett Smith about 1 month ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF