Idea #14374
openMulti-site (federated) object search in controller
Description
Implemented in controller.
- Distribute query to each cluster in the federation.
- Collect responses, collate by order and truncate to response item limit
- Construct a "page token" which consists of the original filters plus additional information needed to get remaining unseen items on each cluster
- Respond with object list and page token
- Subsequent requests should only use the page token
Query parameters:
- enable_multisite_search
- true enables multisite search
- true by default?
- provide cluster_id= to search only an individual cluster?
- return_partial_results
- when true, if a query to a remote cluster fails, return a response with that cluster stripped out and an additional "warning" field mentioning there was a problem
- when false, if any query to a remote cluster fails, the whole request fails
- multisite_search_timeout
- how long to wait for a result from one of the remote clusters
Query considerations:
- If "select" is specified and "uuid" is missing, must add in uuid for remote queries (needed for paging) and strip it out in the response
- Only order on scalar columns. Disambiguate rows with the same value by ordering by uuid.
- Cannot have "offset"
- Must be "count=none"
- Must be "distinct=false"
Related issues
Updated by Tom Morris about 6 years ago
- Subject changed from Multi-site workflow search to Multi-site object search
- Description updated (diff)
Updated by Peter Amstutz about 6 years ago
[THIS IS NO LONGER RELEVANT]
The main challenges are
- return responses as we get them (so one slow site doesn't slow down the whole response)
- stateless paging
Items in the response can only be globally ordered by waiting for and merging responses from every site. We can restrict/redefine ordering behavior, such as: objects on the same cluster will be returned sequentially and ordered internally, the client is expected to re-sort.
If we write the response json object incrementally (as remote responses come back) we could use streaming (event driven) parsing in the browser with http://oboejs.com/
Note: writing incrementally probably means using chunked Transfer-Encoding. See also https://github.com/jonnyreeves/chunked-request
On paging.
Given the ordering restriction described above, the client could use a recipe like:
For each sequence of items on the same cluster, look for the last one (as defined by a cut-over in the list from one uuid prefix to another, or the end of the list). Use each "final" item to construct a query that fetches the next page of items.
Note: we might not know that we have the "last" item if the following response is delayed, so we are delayed in seeing the cut-over. Maybe "items" should be a list of lists, so we see the termination of the list instead? Or maybe "items" should be a map of cluster → items.
Maybe a federated response special form:
{ "kind": "arvados#federatedList", "parts": { "qr1hi": { "items": [...] }, "4xphq": { "items": [...] } } }
With corresponding federated query?
filters: [ ["any", "@@", "my query string"], { "parts": { "qr1hi": [["uuid", ">", "123"]], "4xphq": [["uuid", ">", "456"]] } ]
Updated by Peter Amstutz about 6 years ago
Strike note-2 in favor of current description.
Updated by Peter Amstutz about 6 years ago
- Description updated (diff)
- Target version changed from To Be Groomed to Arvados Future Sprints
- Story points set to 3.0
Updated by Tom Morris over 5 years ago
- Related to Bug #14710: [Workbench] Child containers run on federated clusters do not show up added
Updated by Peter Amstutz over 5 years ago
- Subject changed from Multi-site object search to Multi-site (federated) object search
Updated by Peter Amstutz over 5 years ago
- Related to Feature #15353: [Data operations] Chooser supports browing/picking projects/collections on federated clusters added
Updated by Peter Amstutz over 5 years ago
- Subject changed from Multi-site (federated) object search to Multi-site (federated) object search in controller
Updated by Peter Amstutz over 3 years ago
- Target version deleted (
Arvados Future Sprints)