Version 12 - History - Fixing cloud scheduling - Arvados

Fixing cloud scheduling » History » Version 12

Peter Amstutz, 08/03/2018 02:32 PM

-Peter Amstutz
+h1. Fixing cloud scheduling
 Our current approach to scheduling containers on the cloud using SLURM has a number of problems:
 * Head-of-line problem: with a single queue, slurm will only schedule the job at the top of the queue, if it cannot be scheduled, every other job has to wait.  This results in wasteful idle nodes and reduces throughput.
 * Queue ordering doesn't reflect our desired priority order without a lot of hacking around with "niceness"
 * Slurm queue forgets dynamic configuration, requires constant maintenance processes to reset slurm dynamic configuration
-Peter Amstutz
+Things that slurm currently provides:
 * allocating containers to specific nodes
-Peter Amstutz
+* reporting idle/busy/failed/down state, and out of contact
 Peter Amstutz
-Peter Amstutz
+h2. crunch-dispatch-cloud
 Peter Amstutz
-Peter Amstutz
+See https://dev.arvados.org/projects/arvados/wiki/Dispatching_containers_to_cloud_VMs#crunch-dispatch-cloud-PA
 Peter Amstutz
 h1. Other options
-Peter Amstutz
+h2. Kubernetes
 Peter Amstutz
-Peter Amstutz
+Submit containers to a Kubernetes cluster.  Kubernetes handles cluster scaling and scheduling.
 Peter Amstutz
 Advantages:
-Peter Amstutz
+* Get rid of node manager
 * Desirable as part of overall plan to be able to run Arvados on Kubernetes
 Peter Amstutz
-Peter Amstutz
+Disadvantages:
 Peter Amstutz
-Peter Amstutz
+* Running crunch-run inside a container requires docker-in-docker (privileged container) or access to the Docker socket.
 Peter Amstutz
 h2. Cloud provider scheduling APIs
 Use cloud provider scheduling APIs such as Azure Batch, AWS Batch, Google pipelines API to perform cluster scaling and scheduling.
 Would be implemented as custom Arvados dispatcher services: crunch-dispatch-azure, crunch-dispatch-aws, crunch-dispatch-google.
 Advantages:
 * Get rid of Node Manager
 Disadvantages:
 * Has to be implemented per cloud provider.
 * May be hard to customize behavior, such as job priority.
-Peter Amstutz
+h2. Use slurm better
 Peter Amstutz
-Peter Amstutz
+Most of our slurm problems are self-inflicted.  We have a single partition and single queue with heterogeneous, dynamically configured nodes.  We would have fewer problems if we adopted a strategy whereby we define configure slurm ranges "compute-small-[0-255]", "compute-medium-[0-255]", "compute-large-[0-255]" with appropriate specs.  Define a partition for each size range, so that a job waiting for one node size does not hold up jobs that want a different node size.
 Peter Amstutz
 Advantages:
-Peter Amstutz
+* Least overall change compared to current architecture
 Peter Amstutz
-Peter Amstutz
+Disadvantages:
 Peter Amstutz
-Peter Amstutz
+* Requires coordinated change to API server, node manager, crunch-dispatch-slurm, cluster configuration
 * Ops seems to think that defining (sizes * max nodes) hostnames might be a problem?
 * Can't adjust node configurations without restarting the whole cluster

Project

General

Profile

Arvados

Fixing cloud scheduling » History » Version 12