Project

General

Profile

Actions

Task #2976

closed

Idea #2880: Component/job can specify minimum memory and scratch space for worker nodes, and Crunch enforces these requirements at runtime

Crunch only starts jobs when hardware constraints are satisfied

Added by Brett Smith almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:

Description

The hardware runtime constraints are:

  • min_ram_mb_per_node - The amount of RAM (MiB) available on each Node.
  • min_scratch_mb_per_node - The amount of disk space (MiB) available for local caching on each Node.

Corresponding information about each node will lives in its info hash, and will be updated each ping.

Because we currently don't have a concept of Job priority, and it's not in this sprint, it seems best to stick pretty closely to Crunch's current FIFO strategy for working the Job queue. However, we need to take precautions to make sure that an unreasonably resource-large Job at the front of the queue doesn't prevent us from making progress on the rest of it. Plan: When the Job at the front of the queue can't be started because resource requirements aren't met, Crunch will wait for a few minutes to see if the Node Manager makes those resources available. If it does, great; proceed as normal. If not, continue through the queue and start the first job that can be run with available resources. Make sure that this wait only happens every so often, so lots of queue activity doesn't cause lots of waiting.

Actions #1

Updated by Brett Smith almost 10 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Brett Smith almost 10 years ago

Have the code written. Need to test it and get it reviewed.

Actions #3

Updated by Brett Smith almost 10 years ago

  • Remaining (hours) changed from 8.0 to 3.0
Actions #4

Updated by Brett Smith almost 10 years ago

  • Remaining (hours) changed from 3.0 to 2.0

Ready for review in #2993.

Actions #5

Updated by Brett Smith almost 10 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
  • Remaining (hours) changed from 2.0 to 0.0

Applied in changeset arvados|commit:82c4697bf24b10f3fb66d303ae73499095b5742a.

Actions

Also available in: Atom PDF