Project

General

Profile

Fixing cloud scheduling » History » Version 1

Peter Amstutz, 07/25/2018 03:33 PM

1 1 Peter Amstutz
h1. Fixing cloud scheduling
2
3
Our current approach to scheduling containers on the cloud using SLURM has a number of problems:
4
5
* Head-of-line problem: with a single queue, slurm will only schedule the job at the top of the queue, if it cannot be scheduled, every other job has to wait.  This results in wasteful idle nodes and reduces throughput.
6
* Queue ordering doesn't reflect our desired priority order without a lot of hacking around with "niceness"
7
* Slurm queue forgets dynamic configuration, requires constant maintenance processes to reset slurm dynamic configuration 
8
9
Some solutions:
10
11
h2. Use slurm better
12
13
Most of our slurm problems are self-inflicted.  We have a single partition and single queue with heterogeneous, dynamically configured nodes.  We would have fewer problems if we adopted a strategy whereby we define configure slurm ranges "compute-small-[0-255]", "compute-medium-[0-255]", "compute-large-[0-255]" with appropriate specs.  Define a partition for each size range, so that a job waiting for one node size does not hold up jobs that want a different node size.
14
15
Advantages:
16
17
* Least overall change compared to current architecture
18
19
Disadvantages:  
20
21
* Requires coordinated change to API server, node manager, crunch-dispatch-slurm, cluster configuration
22
* Ops seems to think that defining (sizes * max nodes) hostnames might be a problem?
23
* Can't adjust node configurations without restarting the whole cluster
24
25
h2. Cloud provider scheduling APIs
26
27
Use cloud provider scheduling APIs such as Azure Batch, AWS Batch, Google pipelines API to perform cluster scaling and scheduling.
28
29
Would be implemented as custom Arvados dispatcher services: crunch-dispatch-azure, crunch-dispatch-aws, crunch-dispatch-google.
30
31
Advantages:
32
33
* Get rid of Node Manager
34
35
Disadvantages:
36
37
* Has to be implemented per cloud provider.
38
* May be hard to customize behavior, such as job priority.
39
40
h2. Kubernetes
41
42
Submit containers to a Kubernetes cluster.  Kubernetes handles cluster scaling and scheduling.
43
44
Advantages:
45
46
* Get rid of node manager
47
* Desirable as part of overall plan to be able to run Arvados on Kubernetes
48
49
Disadvantages:
50
51
* Running crunch-run inside a container requires docker-in-docker (privileged container) or access to the Docker socket.
52
53
h2. Crunch-dispatch-local
54
55
Node manager spins up nodes based on container queue.  Compute nodes run crunch-dispatch-local or similar service, which asks the API server for work and then runs it.  Possibly node manager directly decides which jobs should go onto which nodes.
56
57
Advantages:
58
59
* Complete control over scheduling decisions / priority
60
61
Disadvantages:
62
63
* Requesting work puts additional load of API server (may not be any worse than live logging, though)
64
* Need a new scheme for nodes to report their status so that node manager knows if they are busy, idle.  Node manager has to be able to put nodes in equivalent of "draining" state to ensure they don't get shut down while doing work.  (We can use the "nodes" table for this).
65
* Need to be able to detect node failure.