Feature #15457

[Controller] Delegate new container requests to other clusters based on location of input data

Added by Tom Clegg 9 days ago. Updated 9 days ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
API
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
3.0

Description

When a client creates a new container request (and doesn't specify a desired cluster ID) controller should resolve all input collection to PDHs as needed, and then:
  • if all inputs are available locally, create a local container request (as the current implementation does in all cases)
  • otherwise, rank local/remote clusters according to how many of the input data bytes they have on hand, and execute a "create CR" request on the highest-ranking cluster -- being sure to specify the chosen cluster ID so the remote cluster doesn't have to repeat the ranking/selection process itself.
    • if the local cluster is tied with a remote, choose the local cluster
    • use the file_size_total collection attribute

At least for now, don't go to too much trouble to be precise -- if a mount only refers to a small file in a large collection, it's OK to rank by the entire collection size.

If a remote cluster returns an error during the "probe for inputs" phase, drop that cluster from the list of candidates.

If a remote cluster returns an error or times out when submitting a container request, fall back to submitting to the local cluster (unless this fallback is disabled via config knob). If this fallback is enabled, the remote call should time out in 1/2 the remaining portion of the locally configured API request timeout (see Deadline()). If the local cluster fails (whether or not a remote has also been attempted), just return the error to the caller.

Add an entry to the CR's properties hash indicating how the cluster was chosen, including any errors encountered when probing or submitting to remotes.


Related issues

Related to Arvados - Bug #14710: [Workbench] Child containers run on federated clusters do not show upNew

History

#1 Updated by Tom Clegg 9 days ago

  • Related to Bug #14710: [Workbench] Child containers run on federated clusters do not show up added

#2 Updated by Tom Clegg 9 days ago

  • Description updated (diff)

#3 Updated by Tom Clegg 9 days ago

  • Description updated (diff)

#4 Updated by Tom Clegg 9 days ago

  • Description updated (diff)

#5 Updated by Tom Clegg 9 days ago

  • Description updated (diff)

#6 Updated by Tom Morris 9 days ago

  • Story points set to 3.0
  • Target version changed from To Be Groomed to Arvados Future Sprints

Also available in: Atom PDF