Project

General

Profile

Crunch2 installation » History » Version 15

Brett Smith, 08/05/2016 02:36 PM
convert setup instructions to systemd

1 1 Tom Clegg
h1. Crunch2 installation
2
3
(DRAFT -- when ready, this will move to doc.arvados.org→install)
4
5 2 Tom Clegg
{{toc}}
6
7
h2. Set up a crunch-dispatch service
8
9
Currently, dispatching containers via SLURM is supported.
10
11 9 Brett Smith
Install crunch-dispatch-slurm on a node that can submit SLURM jobs. This can be any node appropriately configured to connect to the SLURM controller node.
12 2 Tom Clegg
13
<pre><code class="shell">
14 15 Brett Smith
$ sudo apt-get install crunch-dispatch-slurm
15 2 Tom Clegg
</code></pre>
16
17 9 Brett Smith
Create a privileged Arvados API token for use by the dispatcher. If you have multiple dispatch processes, you should give each one a different token.
18 2 Tom Clegg
19
<pre><code class="shell">
20
apiserver:~$ cd /var/www/arvados-api/current
21
apiserver:/var/www/arvados-api/current$ sudo -u webserver-user RAILS_ENV=production bundle exec script/create_superuser_token.rb
22
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
23
</code></pre>
24
25 15 Brett Smith
Make sure crunch-dispatch-slurm runs with the @ARVADOS_API_HOST@ and @ARVADOS_API_TOKEN@ environment variables set, using the token you just generated:
26 1 Tom Clegg
27
<pre><code class="shell">
28 15 Brett Smith
$ sudo mkdir /etc/systemd/system/crunch-dispatch-slurm.service.d
29
$ sudo install -m 0600 /dev/null /etc/systemd/system/crunch-dispatch-slurm.service.d/api.conf
30
$ sudo editor /etc/systemd/system/crunch-dispatch-slurm.service.d/api.conf
31 2 Tom Clegg
</code></pre>
32 6 Tom Clegg
33 15 Brett Smith
Edit the file to look like this:
34 6 Tom Clegg
35 15 Brett Smith
<pre>[Service]
36
Environment=ARVADOS_API_HOST=zzzzz.arvadosapi.com
37
Environment=ARVADOS_API_TOKEN=zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
38
</pre>
39 6 Tom Clegg
40 10 Tom Clegg
Ensure the @crunch@ user on the dispatch node can run Docker containers on SLURM compute nodes via @srun@ or @sbatch@. Depending on your SLURM installation, this may require that the @crunch@ user exist -- and have the same UID, GID, and home directory -- on the dispatch node and all SLURM compute nodes.
41
42 15 Brett Smith
For example, this should print "OK" (possibly after some extra status/debug messages from SLURM and Docker):
43 10 Tom Clegg
44
<pre>
45
crunch@dispatch:~$ srun -N1 docker run busybox echo OK
46
</pre>
47
48 2 Tom Clegg
49 3 Tom Clegg
h2. Install crunch-run on all compute nodes
50 1 Tom Clegg
51 3 Tom Clegg
<pre><code class="shell">
52
sudo apt-get install crunch-run
53
</code></pre>
54
55 1 Tom Clegg
h2. Enable cgroup accounting on all compute nodes
56
57 4 Tom Clegg
(This requirement isn't new for crunch2/containers, but it seems to be a FAQ. The Docker install guide mentions it's optional and performance-degrading, so it's not too surprising if people skip it. Perhaps we should say why/when it's a good idea to enable it?)
58
59 3 Tom Clegg
Check https://docs.docker.com/engine/installation/linux/ for instructions specific to your distribution.
60
61
For example, on Ubuntu:
62
# Update @/etc/default/grub@ to include: <pre>
63
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
64
</pre>
65
# @sudo update-grub@
66
# Reboot
67 2 Tom Clegg
68 9 Brett Smith
h2. Configure Docker
69 1 Tom Clegg
70 4 Tom Clegg
Unchanged from current docs.
71
72 12 Tom Clegg
h2. Configure SLURM cgroups
73
74
In setups where SLURM uses cgroups to impose resource limits, Crunch can be configured to run its Docker containers inside the cgroup assigned by SLURM. (With the default configuration, docker runs all containers in the "docker" cgroup, which means the SLURM resource limits only apply to crunch-run and arv-mount processes, while the container itself has a separate set of limits imposed by Docker.)
75
76
To configure SLURM to use cgroups for resource limits, add to @/etc/slurm-llnl/slurm.conf@:
77
78
<pre>
79
TaskPlugin=task/cgroup
80
</pre>
81
82
Add to @/etc/slurm-llnl/cgroup.conf@:
83
84
<pre>
85
CgroupMountpoint=/sys/fs/cgroup
86
ConstrainCores=yes
87
ConstrainDevices=yes
88
ConstrainRAMSpace=yes
89
ConstrainSwapSpace=yes
90
</pre>
91
92
(See slurm.conf(5) and cgroup.conf(5) for more information.)
93
94
Add the @-cgroup-parent-subsystem=memory@ option to @/etc/arvados/crunch-dispatch-slurm/config.json@ on the dispatch node:
95
96
<pre>
97
{
98
  "CrunchRunCommand": ["crunch-run", "-cgroup-parent-subsystem=memory"]
99
}
100
</pre>
101
102
The choice of subsystem ("memory" in this example) must correspond to one of the resource types enabled in @cgroup.conf@. Limits for other resource types will also be respected: the specified subsystem is singled out only to let Crunch determine the name of the cgroup provided by SLURM.
103
104
Restart crunch-dispatch-slurm to load the new configuration.
105
106
<pre>
107
root@dispatch:~# sv term /etc/sv/crunch-dispatch-slurm
108
</pre>
109
110 1 Tom Clegg
h2. Test the dispatcher
111 4 Tom Clegg
112 13 Brett Smith
On the dispatch node, start monitoring the crunch-dispatch-slurm logs:
113 4 Tom Clegg
114
<pre><code class="shell">
115 13 Brett Smith
dispatch-node$ sudo journalctl -fu crunch-dispatch-slurm.service
116 5 Tom Clegg
</code></pre>
117
118 11 Tom Clegg
On a shell VM, run a trivial container.
119
120 5 Tom Clegg
<pre><code class="shell">
121 1 Tom Clegg
user@shellvm:~$ arv container_request create --container-request '{
122
  "name":            "test",
123 5 Tom Clegg
  "state":           "Committed",
124
  "priority":        1,
125 13 Brett Smith
  "container_image": "arvados/jobs:latest",
126
  "command":         ["echo", "Hello, Crunch!"],
127 4 Tom Clegg
  "output_path":     "/out",
128 1 Tom Clegg
  "mounts": {
129
    "/out": {
130
      "kind":        "tmp",
131
      "capacity":    1000
132
    }
133 13 Brett Smith
  },
134
  "runtime_constraints": {
135
    "vcpus": 1,
136
    "ram": 8388608
137 1 Tom Clegg
  }
138
}'
139
</code></pre>
140
141
Measures of success:
142 13 Brett Smith
* Dispatcher log entries will indicate it has submitted a SLURM job.
143
  <pre>2016-08-05_13:52:54.73665 2016/08/05 13:52:54 Monitoring container zzzzz-dz642-hdp2vpu9nq14tx0 started
144
2016-08-05_13:53:04.54148 2016/08/05 13:53:04 About to submit queued container zzzzz-dz642-hdp2vpu9nq14tx0
145
2016-08-05_13:53:04.55305 2016/08/05 13:53:04 sbatch succeeded: Submitted batch job 8102
146
</pre>
147
* Before the container finishes, SLURM's @squeue@ command will show the new job in the list of queued/running jobs.
148
  <pre>$ squeue --long
149
Fri Aug  5 13:57:50 2016
150
  JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
151
   8103   compute zzzzz-dz   crunch  RUNNING       1:56 UNLIMITED      1 compute0
152 14 Brett Smith
</pre> The job's name corresponds to the UUID of the container that fulfills this container request.  You can get more information about it by running, e.g., @scontrol show job Name=<UUID>@.
153 13 Brett Smith
* When the container finishes, the dispatcher will log that, with the final result:
154
  <pre>2016-08-05_13:53:14.68780 2016/08/05 13:53:14 Container zzzzz-dz642-hdp2vpu9nq14tx0 now in state "Complete" with locked_by_uuid ""
155
2016-08-05_13:53:14.68782 2016/08/05 13:53:14 Monitoring container zzzzz-dz642-hdp2vpu9nq14tx0 finished
156
</pre>
157 8 Tom Clegg
* After the container finishes, @arv container list --limit 1@ will indicate the outcome: <pre>
158
{
159
 ...
160 7 Tom Clegg
 "exit_code":0,
161 13 Brett Smith
 "log":"a01df2f7e5bc1c2ad59c60a837e90dc6+166",
162
 "output":"d41d8cd98f00b204e9800998ecf8427e+0",
163 1 Tom Clegg
 "state":"Complete",
164
 ...
165 13 Brett Smith
}
166 14 Brett Smith
</pre> You can use standard Keep tools to view the job's output and logs from their corresponding fields.  For example, to see the logs:
167 13 Brett Smith
  <pre>$ arv keep ls a01df2f7e5bc1c2ad59c60a837e90dc6+166
168
./crunch-run.txt
169
./stderr.txt
170
./stdout.txt
171
$ arv keep get a01df2f7e5bc1c2ad59c60a837e90dc6+166/stdout.txt
172
2016-08-05T13:53:06.201011Z Hello, Crunch!
173 7 Tom Clegg
</pre>