Bug #17118
Updated by Peter Amstutz over 4 years ago
Reported by user that arv-put would upload a directory of files, and then sometimes hang before writing the collection. However, the checkpoint file was written, so canceling the process and re-running arv-put would create the collection without waiting for a re-upload. Inspect the code and see if there are any places that seems vulnerable to a deadlock. Here's the follow-up I would like to report a possible bug/improvement for the arv-put command. We ran into some issues when using arv-put where it would die silently without giving any output whatsoever. We have now traced it to the fact that the arv-put cmd essentially runs out of memory (or uses a huge amount of memory). The setup: 1. A folder containing a number of files (< 1500) with a total folder size of 145GB. This entire folder is to be uploaded into Arvados. 2. We run it via Gitlab as a Runner on a Virtual Machine with 16GB of RAM. 3. The arv-put cmd we use: arv-put --no-follow-links --no-resume --exclude 'Thumbnail_Images/*' --exclude done.txt --project-uuid arkau-j7d0g-6a3em925c3yvx9q --name Overnight1 /isilon/nrd_hca/Overnight1/ Output: 1. The script silently dies, no error message, no other output. We have done extensive testing and checking and initially, the arv-put cmd just died silently without giving any error message whatsoever. After some digging, it turns out that arv-put cmd essentially eats up all the memory on the machine and is then killed. We tried to change it so that arrv-put can only use 1 thread but the outcome is the same. See the attached images for the output from 'top' when trying to upload the 145GB folder. We have plans in the future to upload folders with around 750GB of data and if arv-put cannot handle this or needs a huge amount of memory to do this, we will need to reconsider our workflows. We have a couple of questions: 1. What is the relationship between the size of the folder to be uploaded and the amount of memory arv-put will use? 2. Is there a way to estimate how much memory would be needed for a certain folder/size of data? 3. Is there a way to make arv-put fail gracefully in cases like this? 4. If known, what is the reason that arv-put uses so much memory?