Bug #13166

[node manager] wishlist should consist of top priority containers

Added by Peter Amstutz almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
03/26/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Node manager calculates a "wishlist" which is how many nodes of each size are desired.

The current behavior reports the entire queue, the creates nodes starting with the largest size and working its way down to the smallest. The rationale being that small jobs can run on large nodes but not the other way around.

However, #12199 changes the behavior such that containers will be scheduled on specific node sizes. If the queue is much larger than the maximum number of nodes, this could lead to a situation in which the top of the queue consists of small jobs, but only large nodes are available. In this case, it will either schedule jobs out of order (with low-priority large-node jobs jumping the queue) or deadlock.

To fix:

  • After getting the contents of squeue, sort in decending order by slurm priority
  • Only consider the top (max_nodes - up_nodes) items in the wishlist and discard the remaining

Subtasks

Task #13273: Review 13166-nodemanager-whishlistClosedPeter Amstutz


Related issues

Related to Arvados - Story #12552: When priority is equal, the children of an earlier-submitted workflow should run firstResolved11/03/2017

Related to Arvados - Bug #12199: Don't schedule jobs on nodes which are too much bigger than requestedResolved01/29/2018

Associated revisions

Revision f1ee0cb2
Added by Lucas Di Pentima almost 2 years ago

Merge branch '13166-nodemanager-whishlist'
Closes #13166

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <>

History

#1 Updated by Peter Amstutz almost 2 years ago

  • Status changed from New to In Progress

#2 Updated by Peter Amstutz almost 2 years ago

  • Description updated (diff)

#3 Updated by Tom Clegg almost 2 years ago

  • Related to Story #12552: When priority is equal, the children of an earlier-submitted workflow should run first added

#4 Updated by Peter Amstutz almost 2 years ago

  • Related to Bug #12199: Don't schedule jobs on nodes which are too much bigger than requested added

#5 Updated by Peter Amstutz almost 2 years ago

  • Description updated (diff)

#6 Updated by Peter Amstutz almost 2 years ago

  • Status changed from In Progress to New

#7 Updated by Tom Morris almost 2 years ago

  • Assigned To set to Peter Amstutz
  • Target version changed from To Be Groomed to 2018-03-28 Sprint

#8 Updated by Lucas Di Pentima almost 2 years ago

  • Assigned To changed from Peter Amstutz to Lucas Di Pentima

#9 Updated by Lucas Di Pentima almost 2 years ago

  • Status changed from New to In Progress

#10 Updated by Lucas Di Pentima almost 2 years ago

Updates at 93d9c63e1fd4d4c268aa9050406b9a212f3d4a22 - branch 13166-nodemanager-whishlist

Changed wishlist behavior as requested, all tests ran OK on 2dd214985e9727bf0b5fbf11b0e39c3e7d3cc5c4 but they're failing on Jenkins (https://ci.curoverse.com/job/developer-run-tests-services-nodemanager/164/console) but running OK on my VM.

#11 Updated by Peter Amstutz almost 2 years ago

Comment is missing the addition of %Q for priority.

# cpus, memory, tempory disk space, reason, job name, feature constraints
squeue_out = subprocess.check_output(["squeue", "--state=PENDING", "--noheader", "--format=%c|%m|%d|%r|%j|%f|%Q"])

An in-place sort would be slightly more efficient:

queuelist.sort(key=lambda x: x.get('priority', 1), reverse=True)

In order to test the wishlist ordering, you should have a variation of this test where the "big" node is at the end of the list and check that it only boots small nodes.

    def test_node_max_nodes_two_sizes(self):
...
        self.make_daemon(want_sizes=[small, small, big, small],
                         avail_sizes=avail_sizes, max_nodes=3)

I'm not quite sure what is going on with the integration test. The test framework does have a failure mode where it crashes with "IOError: [Errno 11] Resource temporarily unavailable" I haven't gotten to the bottom of what that happens.

#12 Updated by Peter Amstutz almost 2 years ago

I think this fixes the integration tests:

diff --git a/services/nodemanager/tests/integration_test.py b/services/nodemanager/tests/integration_test.py
index 48fc3951e..508e62663 100755
--- a/services/nodemanager/tests/integration_test.py
+++ b/services/nodemanager/tests/integration_test.py
@@ -342,7 +342,6 @@ def main():
                 (r".*Cloud node (\S+) is now paired with Arvados node (\S+) with hostname (\S+)", node_paired),
                 (r".*ComputeNodeMonitorActor\..*\.([^[]*).*Not eligible for shut down because node state is \('busy', 'open', .*\)", node_busy),
                 (r".*ComputeNodeMonitorActor\..*\.([^[]*).*Suggesting shutdown because node state is \('idle', 'open', .*\)", remaining_jobs),
-                (r".*ComputeNodeMonitorActor\..*\.([^[]*).*Not eligible for shut down because node state is \('busy', 'open', .*\)", node_busy),
                 (r".*ComputeNodeShutdownActor\..*\.([^[]*).*Shutdown success", node_shutdown),
                 (r".*ComputeNodeShutdownActor\..*\.([^[]*).*Shutdown success", node_shutdown)
             ],
@@ -369,7 +368,6 @@ def main():
                 (r".*Cloud node (\S+) is now paired with Arvados node (\S+) with hostname (\S+)", node_paired),
                 (r".*ComputeNodeMonitorActor\..*\.([^[]*).*Not eligible for shut down because node state is \('busy', 'open', .*\)", node_busy),
                 (r".*ComputeNodeMonitorActor\..*\.([^[]*).*Suggesting shutdown because node state is \('idle', 'open', .*\)", remaining_jobs),
-                (r".*ComputeNodeMonitorActor\..*\.([^[]*).*Not eligible for shut down because node state is \('busy', 'open', .*\)", node_busy),
                 (r".*ComputeNodeShutdownActor\..*\.([^[]*).*Shutdown success", node_shutdown),
                 (r".*ComputeNodeShutdownActor\..*\.([^[]*).*Shutdown success", node_shutdown),
                 (r".*sending request", jobs_req),


#13 Updated by Lucas Di Pentima almost 2 years ago

Updates at b17321254
Test run: https://ci.curoverse.com/job/developer-run-tests/667/

  • Updated integration tests
  • Updated squeue call comment & list ordering
  • Added test to check wishlist ordering

#14 Updated by Peter Amstutz almost 2 years ago

Lucas Di Pentima wrote:

Updates at b17321254
Test run: https://ci.curoverse.com/job/developer-run-tests/667/

  • Updated integration tests
  • Updated squeue call comment & list ordering
  • Added test to check wishlist ordering

This LGTM, thanks.

#15 Updated by Lucas Di Pentima almost 2 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

#16 Updated by Tom Morris over 1 year ago

  • Release set to 17

Also available in: Atom PDF