Crunch2 installation » History » Version 10
Tom Clegg, 06/17/2016 08:26 PM
1 | 1 | Tom Clegg | h1. Crunch2 installation |
---|---|---|---|
2 | |||
3 | (DRAFT -- when ready, this will move to doc.arvados.org→install) |
||
4 | |||
5 | 2 | Tom Clegg | {{toc}} |
6 | |||
7 | h2. Set up a crunch-dispatch service |
||
8 | |||
9 | Currently, dispatching containers via SLURM is supported. |
||
10 | |||
11 | 9 | Brett Smith | Install crunch-dispatch-slurm on a node that can submit SLURM jobs. This can be any node appropriately configured to connect to the SLURM controller node. |
12 | 2 | Tom Clegg | |
13 | <pre><code class="shell"> |
||
14 | sudo apt-get install crunch-dispatch-slurm |
||
15 | </code></pre> |
||
16 | |||
17 | 9 | Brett Smith | Create a privileged Arvados API token for use by the dispatcher. If you have multiple dispatch processes, you should give each one a different token. |
18 | 2 | Tom Clegg | |
19 | <pre><code class="shell"> |
||
20 | apiserver:~$ cd /var/www/arvados-api/current |
||
21 | apiserver:/var/www/arvados-api/current$ sudo -u webserver-user RAILS_ENV=production bundle exec script/create_superuser_token.rb |
||
22 | zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz |
||
23 | </code></pre> |
||
24 | |||
25 | 4 | Tom Clegg | Save the token on the dispatch node, in <code>/etc/sv/crunch-dispatch-slurm/env/ARVADOS_API_TOKEN</code> |
26 | 2 | Tom Clegg | |
27 | 4 | Tom Clegg | Example runit script (@/etc/sv/crunch-dispatch-slurm/run@): |
28 | 2 | Tom Clegg | |
29 | 1 | Tom Clegg | <pre><code class="shell"> |
30 | #!/bin/sh |
||
31 | 4 | Tom Clegg | set -e |
32 | exec 2>&1 |
||
33 | 2 | Tom Clegg | |
34 | export ARVADOS_API_HOST=uuid_prefix.your.domain |
||
35 | |||
36 | exec chpst -e ./env -u crunch crunch-dispatch-slurm |
||
37 | </code></pre> |
||
38 | |||
39 | 6 | Tom Clegg | Example runit logging script (@/etc/sv/crunch-dispatch-slurm/log/run@): |
40 | |||
41 | <pre><code class="shell"> |
||
42 | #!/bin/sh |
||
43 | set -e |
||
44 | [ -d main ] || mkdir main |
||
45 | exec svlogd -tt ./main |
||
46 | </code></pre> |
||
47 | |||
48 | 10 | Tom Clegg | Ensure the @crunch@ user on the dispatch node can run Docker containers on SLURM compute nodes via @srun@ or @sbatch@. Depending on your SLURM installation, this may require that the @crunch@ user exist -- and have the same UID, GID, and home directory -- on the dispatch node and all SLURM compute nodes. |
49 | |||
50 | For example, this should print "OK" (possibly after some extra status/debug messages from SLURM and docker): |
||
51 | |||
52 | <pre> |
||
53 | crunch@dispatch:~$ srun -N1 docker run busybox echo OK |
||
54 | </pre> |
||
55 | |||
56 | 2 | Tom Clegg | |
57 | 3 | Tom Clegg | h2. Install crunch-run on all compute nodes |
58 | 1 | Tom Clegg | |
59 | 3 | Tom Clegg | <pre><code class="shell"> |
60 | sudo apt-get install crunch-run |
||
61 | </code></pre> |
||
62 | |||
63 | 1 | Tom Clegg | h2. Enable cgroup accounting on all compute nodes |
64 | |||
65 | 4 | Tom Clegg | (This requirement isn't new for crunch2/containers, but it seems to be a FAQ. The Docker install guide mentions it's optional and performance-degrading, so it's not too surprising if people skip it. Perhaps we should say why/when it's a good idea to enable it?) |
66 | |||
67 | 3 | Tom Clegg | Check https://docs.docker.com/engine/installation/linux/ for instructions specific to your distribution. |
68 | |||
69 | For example, on Ubuntu: |
||
70 | # Update @/etc/default/grub@ to include: <pre> |
||
71 | GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1" |
||
72 | </pre> |
||
73 | # @sudo update-grub@ |
||
74 | # Reboot |
||
75 | 2 | Tom Clegg | |
76 | 9 | Brett Smith | h2. Configure Docker |
77 | 1 | Tom Clegg | |
78 | 4 | Tom Clegg | Unchanged from current docs. |
79 | |||
80 | 1 | Tom Clegg | h2. Test the dispatcher |
81 | 4 | Tom Clegg | |
82 | 5 | Tom Clegg | On the dispatch node, monitor the crunch-dispatch logs. |
83 | 4 | Tom Clegg | |
84 | <pre><code class="shell"> |
||
85 | dispatch-node$ tail -F /etc/sv/crunch-dispatch-slurm/log/main/current |
||
86 | </code></pre> |
||
87 | |||
88 | 9 | Brett Smith | On a shell VM, install a Docker image for testing. |
89 | 1 | Tom Clegg | |
90 | <pre><code class="shell"> |
||
91 | 9 | Brett Smith | user@shellvm:~$ arv keep docker busybox |
92 | 5 | Tom Clegg | </code></pre> |
93 | |||
94 | On a shell VM, run a trivial container. |
||
95 | |||
96 | <pre><code class="shell"> |
||
97 | 4 | Tom Clegg | user@shellvm:~$ arv container_request create --container-request '{ |
98 | 1 | Tom Clegg | "name": "test", |
99 | 4 | Tom Clegg | "state": "Committed", |
100 | "priority": 1, |
||
101 | 5 | Tom Clegg | "container_image": "busybox", |
102 | 8 | Tom Clegg | "command": ["true"], |
103 | "output_path": "/out", |
||
104 | "mounts": { |
||
105 | "/out": { |
||
106 | "kind": "tmp", |
||
107 | "capacity": 1000 |
||
108 | } |
||
109 | } |
||
110 | 1 | Tom Clegg | }' |
111 | </code></pre> |
||
112 | 7 | Tom Clegg | |
113 | Measures of success: |
||
114 | 8 | Tom Clegg | * Dispatcher log entries will indicate it has submitted a SLURM job. |
115 | * Before the container finishes, SLURM's @squeue@ command will show the new job in the list of queued/running jobs. |
||
116 | * After the container finishes, @arv container list --limit 1@ will indicate the outcome: <pre> |
||
117 | 7 | Tom Clegg | { |
118 | ... |
||
119 | "exit_code":0, |
||
120 | ... |
||
121 | "state":"Complete", |
||
122 | ... |
||
123 | } |
||
124 | </pre> |