Crunch2 installation » History » Revision 11
Revision 10 (Tom Clegg, 06/17/2016 08:26 PM) → Revision 11/18 (Tom Clegg, 06/20/2016 09:29 PM)
h1. Crunch2 installation (DRAFT -- when ready, this will move to doc.arvados.org→install) {{toc}} h2. Set up a crunch-dispatch service Currently, dispatching containers via SLURM is supported. Install crunch-dispatch-slurm on a node that can submit SLURM jobs. This can be any node appropriately configured to connect to the SLURM controller node. <pre><code class="shell"> sudo apt-get install crunch-dispatch-slurm </code></pre> Create a privileged Arvados API token for use by the dispatcher. If you have multiple dispatch processes, you should give each one a different token. <pre><code class="shell"> apiserver:~$ cd /var/www/arvados-api/current apiserver:/var/www/arvados-api/current$ sudo -u webserver-user RAILS_ENV=production bundle exec script/create_superuser_token.rb zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz </code></pre> Save the token on the dispatch node, in <code>/etc/sv/crunch-dispatch-slurm/env/ARVADOS_API_TOKEN</code> Example runit script (@/etc/sv/crunch-dispatch-slurm/run@): <pre><code class="shell"> #!/bin/sh set -e exec 2>&1 export ARVADOS_API_HOST=uuid_prefix.your.domain exec chpst -e ./env -u crunch crunch-dispatch-slurm </code></pre> Example runit logging script (@/etc/sv/crunch-dispatch-slurm/log/run@): <pre><code class="shell"> #!/bin/sh set -e [ -d main ] || mkdir main exec svlogd -tt ./main </code></pre> Ensure the @crunch@ user on the dispatch node can run Docker containers on SLURM compute nodes via @srun@ or @sbatch@. Depending on your SLURM installation, this may require that the @crunch@ user exist -- and have the same UID, GID, and home directory -- on the dispatch node and all SLURM compute nodes. For example, this should print "OK" (possibly after some extra status/debug messages from SLURM and docker): <pre> crunch@dispatch:~$ srun -N1 docker run busybox echo OK </pre> h2. Install crunch-run on all compute nodes <pre><code class="shell"> sudo apt-get install crunch-run </code></pre> h2. Enable cgroup accounting on all compute nodes (This requirement isn't new for crunch2/containers, but it seems to be a FAQ. The Docker install guide mentions it's optional and performance-degrading, so it's not too surprising if people skip it. Perhaps we should say why/when it's a good idea to enable it?) Check https://docs.docker.com/engine/installation/linux/ for instructions specific to your distribution. For example, on Ubuntu: # Update @/etc/default/grub@ to include: <pre> GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1" </pre> # @sudo update-grub@ # Reboot h2. Configure Docker Unchanged from current docs. h2. Test the dispatcher On the dispatch node, monitor the crunch-dispatch logs. <pre><code class="shell"> dispatch-node$ tail -F /etc/sv/crunch-dispatch-slurm/log/main/current </code></pre> (TODO: Add example startup logs from crunch-dispatch-slurm) On a shell VM, install a Docker image for testing. <pre><code class="shell"> user@shellvm:~$ arv keep docker busybox </code></pre> (TODO: Add example log/debug messages) On a shell VM, run a trivial container. <pre><code class="shell"> user@shellvm:~$ arv container_request create --container-request '{ "name": "test", "state": "Committed", "priority": 1, "container_image": "busybox", "command": ["true"], "output_path": "/out", "mounts": { "/out": { "kind": "tmp", "capacity": 1000 } } }' </code></pre> Measures of success: * Dispatcher log entries will indicate it has submitted a SLURM job. (TODO: Add example logs.) * Before the container finishes, SLURM's @squeue@ command will show the new job in the list of queued/running jobs. (TODO: Add squeue output, showing how containers look there.) * After the container finishes, @arv container list --limit 1@ will indicate the outcome: <pre> { ... "exit_code":0, ... "state":"Complete", ... } </pre>