Project

General

Profile

Actions

Idea #14374

open

Multi-site (federated) object search in controller

Added by Tom Morris over 5 years ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
Due date:
Story points:
3.0
Release:
Release relationship:
Auto

Description

Implemented in controller.

  1. Distribute query to each cluster in the federation.
  2. Collect responses, collate by order and truncate to response item limit
  3. Construct a "page token" which consists of the original filters plus additional information needed to get remaining unseen items on each cluster
  4. Respond with object list and page token
  5. Subsequent requests should only use the page token

Query parameters:

  • enable_multisite_search
    • true enables multisite search
    • true by default?
    • provide cluster_id= to search only an individual cluster?
  • return_partial_results
    • when true, if a query to a remote cluster fails, return a response with that cluster stripped out and an additional "warning" field mentioning there was a problem
    • when false, if any query to a remote cluster fails, the whole request fails
  • multisite_search_timeout
    • how long to wait for a result from one of the remote clusters

Query considerations:

  • If "select" is specified and "uuid" is missing, must add in uuid for remote queries (needed for paging) and strip it out in the response
  • Only order on scalar columns. Disambiguate rows with the same value by ordering by uuid.
  • Cannot have "offset"
  • Must be "count=none"
  • Must be "distinct=false"

Related issues

Related to Arvados - Bug #14710: [Workbench] Child containers run on federated clusters do not show upNewActions
Related to Arvados Workbench 2 - Feature #15353: [Data operations] Chooser supports browing/picking projects/collections on federated clustersNewActions
Actions #1

Updated by Tom Morris over 5 years ago

  • Subject changed from Multi-site workflow search to Multi-site object search
  • Description updated (diff)
Actions #2

Updated by Peter Amstutz over 5 years ago

[THIS IS NO LONGER RELEVANT]

The main challenges are

  1. return responses as we get them (so one slow site doesn't slow down the whole response)
  2. stateless paging

Items in the response can only be globally ordered by waiting for and merging responses from every site. We can restrict/redefine ordering behavior, such as: objects on the same cluster will be returned sequentially and ordered internally, the client is expected to re-sort.

If we write the response json object incrementally (as remote responses come back) we could use streaming (event driven) parsing in the browser with http://oboejs.com/

Note: writing incrementally probably means using chunked Transfer-Encoding. See also https://github.com/jonnyreeves/chunked-request

On paging.

Given the ordering restriction described above, the client could use a recipe like:

For each sequence of items on the same cluster, look for the last one (as defined by a cut-over in the list from one uuid prefix to another, or the end of the list). Use each "final" item to construct a query that fetches the next page of items.

Note: we might not know that we have the "last" item if the following response is delayed, so we are delayed in seeing the cut-over. Maybe "items" should be a list of lists, so we see the termination of the list instead? Or maybe "items" should be a map of cluster → items.

Maybe a federated response special form:

{
  "kind": "arvados#federatedList",
  "parts": {
    "qr1hi": {
      "items": [...]
    },
    "4xphq": {
      "items": [...]
    }
  }
}

With corresponding federated query?

filters: [
  ["any", "@@", "my query string"], 
  {
    "parts": {
       "qr1hi": [["uuid", ">", "123"]], 
       "4xphq": [["uuid", ">", "456"]]
   }
] 
Actions #3

Updated by Peter Amstutz over 5 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz over 5 years ago

  • Description updated (diff)
Actions #5

Updated by Peter Amstutz over 5 years ago

Strike note-2 in favor of current description.

Actions #6

Updated by Peter Amstutz over 5 years ago

  • Description updated (diff)
  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 3.0
Actions #7

Updated by Tom Morris about 5 years ago

  • Related to Bug #14710: [Workbench] Child containers run on federated clusters do not show up added
Actions #8

Updated by Peter Amstutz almost 5 years ago

  • Subject changed from Multi-site object search to Multi-site (federated) object search
Actions #9

Updated by Peter Amstutz almost 5 years ago

  • Related to Feature #15353: [Data operations] Chooser supports browing/picking projects/collections on federated clusters added
Actions #10

Updated by Peter Amstutz almost 5 years ago

  • Subject changed from Multi-site (federated) object search to Multi-site (federated) object search in controller
Actions #11

Updated by Peter Amstutz almost 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions #12

Updated by Peter Amstutz about 1 year ago

  • Release set to 60
Actions #13

Updated by Peter Amstutz about 2 months ago

  • Target version set to Future
Actions

Also available in: Atom PDF