Feature #15457
open[Controller] Delegate new container requests to other clusters based on location of input data
Description
- if all inputs are available locally, create a local container request (as the current implementation does in all cases)
- otherwise, rank local/remote clusters according to how many of the input data bytes they have on hand, and execute a "create CR" request on the highest-ranking cluster -- being sure to specify the chosen cluster ID so the remote cluster doesn't have to repeat the ranking/selection process itself.
- if the local cluster is tied with a remote, choose the local cluster
- use the file_size_total collection attribute
At least for now, don't go to too much trouble to be precise -- if a mount only refers to a small file in a large collection, it's OK to rank by the entire collection size.
If a remote cluster returns an error during the "probe for inputs" phase, drop that cluster from the list of candidates.
If a remote cluster returns an error or times out when submitting a container request, fall back to submitting to the local cluster (unless this fallback is disabled via config knob). If this fallback is enabled, the remote call should time out in 1/2 the remaining portion of the locally configured API request timeout (see Deadline()). If the local cluster fails (whether or not a remote has also been attempted), just return the error to the caller.
Add an entry to the CR's properties hash indicating how the cluster was chosen, including any errors encountered when probing or submitting to remotes.
Related issues
Updated by Tom Clegg about 5 years ago
- Related to Bug #14710: [Workbench] Child containers run on federated clusters do not show up added
Updated by Tom Morris about 5 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
- Story points set to 3.0
Updated by Peter Amstutz over 3 years ago
- Target version deleted (
Arvados Future Sprints)