Project

General

Profile

Crunch2 installation » History » Revision 12

Revision 11 (Tom Clegg, 06/20/2016 09:29 PM) → Revision 12/18 (Tom Clegg, 08/02/2016 02:08 PM)

h1. Crunch2 installation 

 (DRAFT -- when ready, this will move to doc.arvados.org→install) 

 {{toc}} 

 

 h2. Set up a crunch-dispatch service 

 Currently, dispatching containers via SLURM is supported. 

 Install crunch-dispatch-slurm on a node that can submit SLURM jobs. This can be any node appropriately configured to connect to the SLURM controller node. 

 <pre><code class="shell"> 
 sudo apt-get install crunch-dispatch-slurm 
 </code></pre> 

 Create a privileged Arvados API token for use by the dispatcher. If you have multiple dispatch processes, you should give each one a different token. 

 <pre><code class="shell"> 
 apiserver:~$ cd /var/www/arvados-api/current 
 apiserver:/var/www/arvados-api/current$ sudo -u webserver-user RAILS_ENV=production bundle exec script/create_superuser_token.rb 
 zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz 
 </code></pre> 

 Save the token on the dispatch node, in <code>/etc/sv/crunch-dispatch-slurm/env/ARVADOS_API_TOKEN</code> 

 Example runit script (@/etc/sv/crunch-dispatch-slurm/run@): 

 <pre><code class="shell"> 
 #!/bin/sh 
 set -e 
 exec 2>&1 

 export ARVADOS_API_HOST=uuid_prefix.your.domain 

 exec chpst -e ./env -u crunch crunch-dispatch-slurm 
 </code></pre> 

 Example runit logging script (@/etc/sv/crunch-dispatch-slurm/log/run@): 

 <pre><code class="shell"> 
 #!/bin/sh 
 set -e 
 [ -d main ] || mkdir main 
 exec svlogd -tt ./main 
 </code></pre> 

 Ensure the @crunch@ user on the dispatch node can run Docker containers on SLURM compute nodes via @srun@ or @sbatch@. Depending on your SLURM installation, this may require that the @crunch@ user exist -- and have the same UID, GID, and home directory -- on the dispatch node and all SLURM compute nodes. 

 For example, this should print "OK" (possibly after some extra status/debug messages from SLURM and docker): 

 <pre> 
 crunch@dispatch:~$ srun -N1 docker run busybox echo OK 
 </pre> 


 


 h2. Install crunch-run on all compute nodes 

 <pre><code class="shell"> 
 sudo apt-get install crunch-run 
 </code></pre> 

 h2. Enable cgroup accounting on all compute nodes 

 (This requirement isn't new for crunch2/containers, but it seems to be a FAQ. The Docker install guide mentions it's optional and performance-degrading, so it's not too surprising if people skip it. Perhaps we should say why/when it's a good idea to enable it?) 

 Check https://docs.docker.com/engine/installation/linux/ for instructions specific to your distribution. 

 For example, on Ubuntu: 
 # Update @/etc/default/grub@ to include: <pre> 
 GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1" 
 </pre> 
 # @sudo update-grub@ 
 # Reboot 

 h2. Configure Docker 

 Unchanged from current docs. 

 h2. Configure SLURM cgroups 

 In setups where SLURM uses cgroups to impose resource limits, Crunch can be configured to run its Docker containers inside the cgroup assigned by SLURM. (With the default configuration, docker runs all containers in the "docker" cgroup, which means the SLURM resource limits only apply to crunch-run and arv-mount processes, while the container itself has a separate set of limits imposed by Docker.) 

 To configure SLURM to use cgroups for resource limits, add to @/etc/slurm-llnl/slurm.conf@: 

 <pre> 
 TaskPlugin=task/cgroup 
 </pre> 

 Add to @/etc/slurm-llnl/cgroup.conf@: 

 <pre> 
 CgroupMountpoint=/sys/fs/cgroup 
 ConstrainCores=yes 
 ConstrainDevices=yes 
 ConstrainRAMSpace=yes 
 ConstrainSwapSpace=yes 
 </pre> 

 (See slurm.conf(5) and cgroup.conf(5) for more information.) 

 Add the @-cgroup-parent-subsystem=memory@ option to @/etc/arvados/crunch-dispatch-slurm/config.json@ on the dispatch node: 

 <pre> 
 { 
   "CrunchRunCommand": ["crunch-run", "-cgroup-parent-subsystem=memory"] 
 } 
 </pre> 

 The choice of subsystem ("memory" in this example) must correspond to one of the resource types enabled in @cgroup.conf@. Limits for other resource types will also be respected: the specified subsystem is singled out only to let Crunch determine the name of the cgroup provided by SLURM. 

 Restart crunch-dispatch-slurm to load the new configuration. 

 <pre> 
 root@dispatch:~# sv term /etc/sv/crunch-dispatch-slurm 
 </pre> 

 

 h2. Test the dispatcher 

 On the dispatch node, monitor the crunch-dispatch logs. 

 <pre><code class="shell"> 
 dispatch-node$ tail -F /etc/sv/crunch-dispatch-slurm/log/main/current 
 </code></pre> 

 (TODO: Add example startup logs from crunch-dispatch-slurm) 

 On a shell VM, install a Docker image for testing. 

 <pre><code class="shell"> 
 user@shellvm:~$ arv keep docker busybox 
 </code></pre> 

 (TODO: Add example log/debug messages) 

 On a shell VM, run a trivial container. 

 <pre><code class="shell"> 
 user@shellvm:~$ arv container_request create --container-request '{ 
   "name":              "test", 
   "state":             "Committed", 
   "priority":          1, 
   "container_image": "busybox", 
   "command":           ["true"], 
   "output_path":       "/out", 
   "mounts": { 
     "/out": { 
       "kind":          "tmp", 
       "capacity":      1000 
     } 
   } 
 }' 
 </code></pre> 

 Measures of success: 
 * Dispatcher log entries will indicate it has submitted a SLURM job. (TODO: Add example logs.) 
 * Before the container finishes, SLURM's @squeue@ command will show the new job in the list of queued/running jobs. (TODO: Add squeue output, showing how containers look there.) 
 * After the container finishes, @arv container list --limit 1@ will indicate the outcome: <pre> 
 { 
  ... 
  "exit_code":0, 
  ... 
  "state":"Complete", 
  ... 
 } 
 </pre>