Project

General

Profile

Crunch2 installation » History » Version 12

Tom Clegg, 08/02/2016 02:08 PM

1 1 Tom Clegg
h1. Crunch2 installation
2
3
(DRAFT -- when ready, this will move to doc.arvados.org→install)
4
5 2 Tom Clegg
{{toc}}
6
7
h2. Set up a crunch-dispatch service
8
9
Currently, dispatching containers via SLURM is supported.
10
11 9 Brett Smith
Install crunch-dispatch-slurm on a node that can submit SLURM jobs. This can be any node appropriately configured to connect to the SLURM controller node.
12 2 Tom Clegg
13
<pre><code class="shell">
14
sudo apt-get install crunch-dispatch-slurm
15
</code></pre>
16
17 9 Brett Smith
Create a privileged Arvados API token for use by the dispatcher. If you have multiple dispatch processes, you should give each one a different token.
18 2 Tom Clegg
19
<pre><code class="shell">
20
apiserver:~$ cd /var/www/arvados-api/current
21
apiserver:/var/www/arvados-api/current$ sudo -u webserver-user RAILS_ENV=production bundle exec script/create_superuser_token.rb
22
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
23
</code></pre>
24
25 4 Tom Clegg
Save the token on the dispatch node, in <code>/etc/sv/crunch-dispatch-slurm/env/ARVADOS_API_TOKEN</code>
26 2 Tom Clegg
27 4 Tom Clegg
Example runit script (@/etc/sv/crunch-dispatch-slurm/run@):
28 2 Tom Clegg
29 1 Tom Clegg
<pre><code class="shell">
30
#!/bin/sh
31 4 Tom Clegg
set -e
32
exec 2>&1
33 2 Tom Clegg
34
export ARVADOS_API_HOST=uuid_prefix.your.domain
35
36
exec chpst -e ./env -u crunch crunch-dispatch-slurm
37
</code></pre>
38
39 6 Tom Clegg
Example runit logging script (@/etc/sv/crunch-dispatch-slurm/log/run@):
40
41
<pre><code class="shell">
42
#!/bin/sh
43
set -e
44
[ -d main ] || mkdir main
45
exec svlogd -tt ./main
46
</code></pre>
47
48 10 Tom Clegg
Ensure the @crunch@ user on the dispatch node can run Docker containers on SLURM compute nodes via @srun@ or @sbatch@. Depending on your SLURM installation, this may require that the @crunch@ user exist -- and have the same UID, GID, and home directory -- on the dispatch node and all SLURM compute nodes.
49
50
For example, this should print "OK" (possibly after some extra status/debug messages from SLURM and docker):
51
52
<pre>
53
crunch@dispatch:~$ srun -N1 docker run busybox echo OK
54
</pre>
55
56 2 Tom Clegg
57 3 Tom Clegg
h2. Install crunch-run on all compute nodes
58 1 Tom Clegg
59 3 Tom Clegg
<pre><code class="shell">
60
sudo apt-get install crunch-run
61
</code></pre>
62
63 1 Tom Clegg
h2. Enable cgroup accounting on all compute nodes
64
65 4 Tom Clegg
(This requirement isn't new for crunch2/containers, but it seems to be a FAQ. The Docker install guide mentions it's optional and performance-degrading, so it's not too surprising if people skip it. Perhaps we should say why/when it's a good idea to enable it?)
66
67 3 Tom Clegg
Check https://docs.docker.com/engine/installation/linux/ for instructions specific to your distribution.
68
69
For example, on Ubuntu:
70
# Update @/etc/default/grub@ to include: <pre>
71
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
72
</pre>
73
# @sudo update-grub@
74
# Reboot
75 2 Tom Clegg
76 9 Brett Smith
h2. Configure Docker
77 1 Tom Clegg
78 4 Tom Clegg
Unchanged from current docs.
79
80 12 Tom Clegg
h2. Configure SLURM cgroups
81
82
In setups where SLURM uses cgroups to impose resource limits, Crunch can be configured to run its Docker containers inside the cgroup assigned by SLURM. (With the default configuration, docker runs all containers in the "docker" cgroup, which means the SLURM resource limits only apply to crunch-run and arv-mount processes, while the container itself has a separate set of limits imposed by Docker.)
83
84
To configure SLURM to use cgroups for resource limits, add to @/etc/slurm-llnl/slurm.conf@:
85
86
<pre>
87
TaskPlugin=task/cgroup
88
</pre>
89
90
Add to @/etc/slurm-llnl/cgroup.conf@:
91
92
<pre>
93
CgroupMountpoint=/sys/fs/cgroup
94
ConstrainCores=yes
95
ConstrainDevices=yes
96
ConstrainRAMSpace=yes
97
ConstrainSwapSpace=yes
98
</pre>
99
100
(See slurm.conf(5) and cgroup.conf(5) for more information.)
101
102
Add the @-cgroup-parent-subsystem=memory@ option to @/etc/arvados/crunch-dispatch-slurm/config.json@ on the dispatch node:
103
104
<pre>
105
{
106
  "CrunchRunCommand": ["crunch-run", "-cgroup-parent-subsystem=memory"]
107
}
108
</pre>
109
110
The choice of subsystem ("memory" in this example) must correspond to one of the resource types enabled in @cgroup.conf@. Limits for other resource types will also be respected: the specified subsystem is singled out only to let Crunch determine the name of the cgroup provided by SLURM.
111
112
Restart crunch-dispatch-slurm to load the new configuration.
113
114
<pre>
115
root@dispatch:~# sv term /etc/sv/crunch-dispatch-slurm
116
</pre>
117
118 1 Tom Clegg
h2. Test the dispatcher
119 4 Tom Clegg
120 5 Tom Clegg
On the dispatch node, monitor the crunch-dispatch logs.
121 4 Tom Clegg
122
<pre><code class="shell">
123
dispatch-node$ tail -F /etc/sv/crunch-dispatch-slurm/log/main/current
124
</code></pre>
125
126 11 Tom Clegg
(TODO: Add example startup logs from crunch-dispatch-slurm)
127
128 9 Brett Smith
On a shell VM, install a Docker image for testing.
129 1 Tom Clegg
130
<pre><code class="shell">
131 9 Brett Smith
user@shellvm:~$ arv keep docker busybox
132 5 Tom Clegg
</code></pre>
133
134 11 Tom Clegg
(TODO: Add example log/debug messages)
135
136 5 Tom Clegg
On a shell VM, run a trivial container.
137
138
<pre><code class="shell">
139 4 Tom Clegg
user@shellvm:~$ arv container_request create --container-request '{
140 1 Tom Clegg
  "name":            "test",
141 4 Tom Clegg
  "state":           "Committed",
142
  "priority":        1,
143 5 Tom Clegg
  "container_image": "busybox",
144 8 Tom Clegg
  "command":         ["true"],
145
  "output_path":     "/out",
146
  "mounts": {
147
    "/out": {
148
      "kind":        "tmp",
149
      "capacity":    1000
150
    }
151
  }
152 7 Tom Clegg
}'
153
</code></pre>
154 1 Tom Clegg
155
Measures of success:
156 11 Tom Clegg
* Dispatcher log entries will indicate it has submitted a SLURM job. (TODO: Add example logs.)
157
* Before the container finishes, SLURM's @squeue@ command will show the new job in the list of queued/running jobs. (TODO: Add squeue output, showing how containers look there.)
158 8 Tom Clegg
* After the container finishes, @arv container list --limit 1@ will indicate the outcome: <pre>
159 7 Tom Clegg
{
160
 ...
161
 "exit_code":0,
162
 ...
163
 "state":"Complete",
164
 ...
165
}
166
</pre>