Fixing cloud scheduling » History » Version 12

Peter Amstutz, 08/03/2018 02:32 PM

1 1 Peter Amstutz
h1. Fixing cloud scheduling
2 1 Peter Amstutz
3 1 Peter Amstutz
Our current approach to scheduling containers on the cloud using SLURM has a number of problems:
4 1 Peter Amstutz
5 1 Peter Amstutz
* Head-of-line problem: with a single queue, slurm will only schedule the job at the top of the queue, if it cannot be scheduled, every other job has to wait.  This results in wasteful idle nodes and reduces throughput.
6 1 Peter Amstutz
* Queue ordering doesn't reflect our desired priority order without a lot of hacking around with "niceness"
7 1 Peter Amstutz
* Slurm queue forgets dynamic configuration, requires constant maintenance processes to reset slurm dynamic configuration 
8 1 Peter Amstutz
9 2 Peter Amstutz
Things that slurm currently provides:
10 2 Peter Amstutz
11 2 Peter Amstutz
* allocating containers to specific nodes
12 7 Peter Amstutz
* reporting idle/busy/failed/down state, and out of contact
13 2 Peter Amstutz
14 12 Peter Amstutz
h2. crunch-dispatch-cloud
15 3 Peter Amstutz
16 12 Peter Amstutz
See https://dev.arvados.org/projects/arvados/wiki/Dispatching_containers_to_cloud_VMs#crunch-dispatch-cloud-PA
17 3 Peter Amstutz
18 3 Peter Amstutz
h1. Other options
19 3 Peter Amstutz
20 6 Peter Amstutz
h2. Kubernetes
21 1 Peter Amstutz
22 6 Peter Amstutz
Submit containers to a Kubernetes cluster.  Kubernetes handles cluster scaling and scheduling.
23 1 Peter Amstutz
24 1 Peter Amstutz
Advantages:
25 1 Peter Amstutz
26 6 Peter Amstutz
* Get rid of node manager
27 6 Peter Amstutz
* Desirable as part of overall plan to be able to run Arvados on Kubernetes
28 1 Peter Amstutz
29 6 Peter Amstutz
Disadvantages:
30 1 Peter Amstutz
31 6 Peter Amstutz
* Running crunch-run inside a container requires docker-in-docker (privileged container) or access to the Docker socket.
32 1 Peter Amstutz
33 1 Peter Amstutz
h2. Cloud provider scheduling APIs
34 1 Peter Amstutz
35 1 Peter Amstutz
Use cloud provider scheduling APIs such as Azure Batch, AWS Batch, Google pipelines API to perform cluster scaling and scheduling.
36 1 Peter Amstutz
37 1 Peter Amstutz
Would be implemented as custom Arvados dispatcher services: crunch-dispatch-azure, crunch-dispatch-aws, crunch-dispatch-google.
38 1 Peter Amstutz
39 1 Peter Amstutz
Advantages:
40 1 Peter Amstutz
41 1 Peter Amstutz
* Get rid of Node Manager
42 1 Peter Amstutz
43 1 Peter Amstutz
Disadvantages:
44 1 Peter Amstutz
45 1 Peter Amstutz
* Has to be implemented per cloud provider.
46 1 Peter Amstutz
* May be hard to customize behavior, such as job priority.
47 1 Peter Amstutz
48 6 Peter Amstutz
h2. Use slurm better
49 1 Peter Amstutz
50 6 Peter Amstutz
Most of our slurm problems are self-inflicted.  We have a single partition and single queue with heterogeneous, dynamically configured nodes.  We would have fewer problems if we adopted a strategy whereby we define configure slurm ranges "compute-small-[0-255]", "compute-medium-[0-255]", "compute-large-[0-255]" with appropriate specs.  Define a partition for each size range, so that a job waiting for one node size does not hold up jobs that want a different node size.
51 1 Peter Amstutz
52 1 Peter Amstutz
Advantages:
53 1 Peter Amstutz
54 6 Peter Amstutz
* Least overall change compared to current architecture
55 1 Peter Amstutz
56 6 Peter Amstutz
Disadvantages:  
57 1 Peter Amstutz
58 6 Peter Amstutz
* Requires coordinated change to API server, node manager, crunch-dispatch-slurm, cluster configuration
59 6 Peter Amstutz
* Ops seems to think that defining (sizes * max nodes) hostnames might be a problem?
60 6 Peter Amstutz
* Can't adjust node configurations without restarting the whole cluster