Crunch2 installation » History » Version 11
Tom Clegg, 06/20/2016 09:29 PM
1 | 1 | Tom Clegg | h1. Crunch2 installation |
---|---|---|---|
2 | |||
3 | (DRAFT -- when ready, this will move to doc.arvados.org→install) |
||
4 | |||
5 | 2 | Tom Clegg | {{toc}} |
6 | |||
7 | h2. Set up a crunch-dispatch service |
||
8 | |||
9 | Currently, dispatching containers via SLURM is supported. |
||
10 | |||
11 | 9 | Brett Smith | Install crunch-dispatch-slurm on a node that can submit SLURM jobs. This can be any node appropriately configured to connect to the SLURM controller node. |
12 | 2 | Tom Clegg | |
13 | <pre><code class="shell"> |
||
14 | sudo apt-get install crunch-dispatch-slurm |
||
15 | </code></pre> |
||
16 | |||
17 | 9 | Brett Smith | Create a privileged Arvados API token for use by the dispatcher. If you have multiple dispatch processes, you should give each one a different token. |
18 | 2 | Tom Clegg | |
19 | <pre><code class="shell"> |
||
20 | apiserver:~$ cd /var/www/arvados-api/current |
||
21 | apiserver:/var/www/arvados-api/current$ sudo -u webserver-user RAILS_ENV=production bundle exec script/create_superuser_token.rb |
||
22 | zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz |
||
23 | </code></pre> |
||
24 | |||
25 | 4 | Tom Clegg | Save the token on the dispatch node, in <code>/etc/sv/crunch-dispatch-slurm/env/ARVADOS_API_TOKEN</code> |
26 | 2 | Tom Clegg | |
27 | 4 | Tom Clegg | Example runit script (@/etc/sv/crunch-dispatch-slurm/run@): |
28 | 2 | Tom Clegg | |
29 | 1 | Tom Clegg | <pre><code class="shell"> |
30 | #!/bin/sh |
||
31 | 4 | Tom Clegg | set -e |
32 | exec 2>&1 |
||
33 | 2 | Tom Clegg | |
34 | export ARVADOS_API_HOST=uuid_prefix.your.domain |
||
35 | |||
36 | exec chpst -e ./env -u crunch crunch-dispatch-slurm |
||
37 | </code></pre> |
||
38 | |||
39 | 6 | Tom Clegg | Example runit logging script (@/etc/sv/crunch-dispatch-slurm/log/run@): |
40 | |||
41 | <pre><code class="shell"> |
||
42 | #!/bin/sh |
||
43 | set -e |
||
44 | [ -d main ] || mkdir main |
||
45 | exec svlogd -tt ./main |
||
46 | </code></pre> |
||
47 | |||
48 | 10 | Tom Clegg | Ensure the @crunch@ user on the dispatch node can run Docker containers on SLURM compute nodes via @srun@ or @sbatch@. Depending on your SLURM installation, this may require that the @crunch@ user exist -- and have the same UID, GID, and home directory -- on the dispatch node and all SLURM compute nodes. |
49 | |||
50 | For example, this should print "OK" (possibly after some extra status/debug messages from SLURM and docker): |
||
51 | |||
52 | <pre> |
||
53 | crunch@dispatch:~$ srun -N1 docker run busybox echo OK |
||
54 | </pre> |
||
55 | |||
56 | 2 | Tom Clegg | |
57 | 3 | Tom Clegg | h2. Install crunch-run on all compute nodes |
58 | 1 | Tom Clegg | |
59 | 3 | Tom Clegg | <pre><code class="shell"> |
60 | sudo apt-get install crunch-run |
||
61 | </code></pre> |
||
62 | |||
63 | 1 | Tom Clegg | h2. Enable cgroup accounting on all compute nodes |
64 | |||
65 | 4 | Tom Clegg | (This requirement isn't new for crunch2/containers, but it seems to be a FAQ. The Docker install guide mentions it's optional and performance-degrading, so it's not too surprising if people skip it. Perhaps we should say why/when it's a good idea to enable it?) |
66 | |||
67 | 3 | Tom Clegg | Check https://docs.docker.com/engine/installation/linux/ for instructions specific to your distribution. |
68 | |||
69 | For example, on Ubuntu: |
||
70 | # Update @/etc/default/grub@ to include: <pre> |
||
71 | GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1" |
||
72 | </pre> |
||
73 | # @sudo update-grub@ |
||
74 | # Reboot |
||
75 | 2 | Tom Clegg | |
76 | 9 | Brett Smith | h2. Configure Docker |
77 | 1 | Tom Clegg | |
78 | 4 | Tom Clegg | Unchanged from current docs. |
79 | |||
80 | 1 | Tom Clegg | h2. Test the dispatcher |
81 | 4 | Tom Clegg | |
82 | 5 | Tom Clegg | On the dispatch node, monitor the crunch-dispatch logs. |
83 | 4 | Tom Clegg | |
84 | <pre><code class="shell"> |
||
85 | dispatch-node$ tail -F /etc/sv/crunch-dispatch-slurm/log/main/current |
||
86 | </code></pre> |
||
87 | |||
88 | 11 | Tom Clegg | (TODO: Add example startup logs from crunch-dispatch-slurm) |
89 | |||
90 | 9 | Brett Smith | On a shell VM, install a Docker image for testing. |
91 | 1 | Tom Clegg | |
92 | <pre><code class="shell"> |
||
93 | 9 | Brett Smith | user@shellvm:~$ arv keep docker busybox |
94 | 5 | Tom Clegg | </code></pre> |
95 | |||
96 | 11 | Tom Clegg | (TODO: Add example log/debug messages) |
97 | |||
98 | 5 | Tom Clegg | On a shell VM, run a trivial container. |
99 | |||
100 | <pre><code class="shell"> |
||
101 | 4 | Tom Clegg | user@shellvm:~$ arv container_request create --container-request '{ |
102 | 1 | Tom Clegg | "name": "test", |
103 | 4 | Tom Clegg | "state": "Committed", |
104 | "priority": 1, |
||
105 | 5 | Tom Clegg | "container_image": "busybox", |
106 | 8 | Tom Clegg | "command": ["true"], |
107 | "output_path": "/out", |
||
108 | "mounts": { |
||
109 | "/out": { |
||
110 | "kind": "tmp", |
||
111 | "capacity": 1000 |
||
112 | } |
||
113 | } |
||
114 | 7 | Tom Clegg | }' |
115 | </code></pre> |
||
116 | 1 | Tom Clegg | |
117 | Measures of success: |
||
118 | 11 | Tom Clegg | * Dispatcher log entries will indicate it has submitted a SLURM job. (TODO: Add example logs.) |
119 | * Before the container finishes, SLURM's @squeue@ command will show the new job in the list of queued/running jobs. (TODO: Add squeue output, showing how containers look there.) |
||
120 | 8 | Tom Clegg | * After the container finishes, @arv container list --limit 1@ will indicate the outcome: <pre> |
121 | 7 | Tom Clegg | { |
122 | ... |
||
123 | "exit_code":0, |
||
124 | ... |
||
125 | "state":"Complete", |
||
126 | ... |
||
127 | } |
||
128 | </pre> |