Project

General

Profile

Crunch2 installation » History » Revision 8

Revision 7 (Tom Clegg, 06/17/2016 03:21 PM) → Revision 8/18 (Tom Clegg, 06/17/2016 03:26 PM)

h1. Crunch2 installation 

 (DRAFT -- when ready, this will move to doc.arvados.org→install) 

 {{toc}} 

 h2. Set up a crunch-dispatch service 

 Currently, dispatching containers via SLURM is supported. 

 Install crunch-dispatch-slurm on a node that can submit SLURM jobs. This can be the slurm controller node, a worker node, or any other node that has the appropriate SLURM/munge configuration. 

 <pre><code class="shell"> 
 sudo apt-get install crunch-dispatch-slurm 
 </code></pre> 

 Create a privileged token for use by the dispatcher. If you have multiple dispatch processes, you should give each one a different token. 

 <pre><code class="shell"> 
 apiserver:~$ cd /var/www/arvados-api/current 
 apiserver:/var/www/arvados-api/current$ sudo -u webserver-user RAILS_ENV=production bundle exec script/create_superuser_token.rb 
 zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz 
 </code></pre> 

 Save the token on the dispatch node, in <code>/etc/sv/crunch-dispatch-slurm/env/ARVADOS_API_TOKEN</code> 

 Example runit script (@/etc/sv/crunch-dispatch-slurm/run@): 

 <pre><code class="shell"> 
 #!/bin/sh 
 set -e 
 exec 2>&1 

 export ARVADOS_API_HOST=uuid_prefix.your.domain 

 exec chpst -e ./env -u crunch crunch-dispatch-slurm 
 </code></pre> 

 Example runit logging script (@/etc/sv/crunch-dispatch-slurm/log/run@): 

 <pre><code class="shell"> 
 #!/bin/sh 
 set -e 
 [ -d main ] || mkdir main 
 exec svlogd -tt ./main 
 </code></pre> 

 Ensure the @crunch@ user exists -- and has the same UID, GID, and home directory -- on the dispatch node and all SLURM compute nodes. Ensure the @crunch@ user can run docker containers on SLURM compute nodes. 

 h2. Install crunch-run on all compute nodes 

 <pre><code class="shell"> 
 sudo apt-get install crunch-run 
 </code></pre> 

 h2. Enable cgroup accounting on all compute nodes 

 (This requirement isn't new for crunch2/containers, but it seems to be a FAQ. The Docker install guide mentions it's optional and performance-degrading, so it's not too surprising if people skip it. Perhaps we should say why/when it's a good idea to enable it?) 

 Check https://docs.docker.com/engine/installation/linux/ for instructions specific to your distribution. 

 For example, on Ubuntu: 
 # Update @/etc/default/grub@ to include: <pre> 
 GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1" 
 </pre> 
 # @sudo update-grub@ 
 # Reboot 

 h2. Configure docker 

 Unchanged from current docs. 

 

 h2. Test the dispatcher 

 On the dispatch node, monitor the crunch-dispatch logs. 

 <pre><code class="shell"> 
 dispatch-node$ tail -F /etc/sv/crunch-dispatch-slurm/log/main/current 
 </code></pre> 

 On a shell VM, install a docker image for testing. 

 <pre><code class="shell"> 
 user@shellvm:~$ arv-keepdocker busybox 
 </code></pre> 

 On a shell VM, run a trivial container. 

 <pre><code class="shell"> 
 user@shellvm:~$ arv container_request create --container-request '{ 
   "name":              "test", 
   "state":             "Committed", 
   "priority":          1, 
   "container_image": "busybox", 
   "command":           ["true"], ["echo", "OK"], 
   "output_path":       "/out", 
   "mounts": { 
     "/out": { 
       "kind":          "tmp", 
       "capacity":      1000 
     } 
   } "/dev/null" 
 }' 
 </code></pre> 

 Measures of success: 
 * Dispatcher You should see dispatcher log entries will indicate indicating it has submitted a SLURM job. 
 * Before Provided the container finishes, doesn't finish before you get a chance, SLURM's @squeue@ command will should show the new job in the list of queued/running jobs. 
 * After the container SLURM job finishes, @arv container list --limit 1@ will should indicate the outcome: <pre> 
 { 
  ... 
  "exit_code":0, 
  ... 
  "state":"Complete", 
  ... 
 } 
 </pre>