Story #14374

Multi-site object search

Added by Tom Morris about 2 months ago. Updated about 2 months ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:


Implemented in controller.

  1. Distribute query to each cluster in the federation.
  2. Collect responses, collate by order and truncate to response item limit
  3. Construct a "page token" which consists of the original filters plus additional information needed to get remaining unseen items on each cluster
  4. Respond with object list and page token
  5. Subsequent requests should only use the page token

Query parameters:

  • enable_multisite_search
    • true enables multisite search
    • true by default?
    • provide cluster_id= to search only an individual cluster?
  • return_partial_results
    • when true, if a query to a remote cluster fails, return a response with that cluster stripped out and an additional "warning" field mentioning there was a problem
    • when false, if any query to a remote cluster fails, the whole request fails
  • multisite_search_timeout
    • how long to wait for a result from one of the remote clusters

Query considerations:

  • If "select" is specified and "uuid" is missing, must add in uuid for remote queries (needed for paging) and strip it out in the response
  • Only order on scalar columns. Disambiguate rows with the same value by ordering by uuid.
  • Cannot have "offset"
  • Must be "count=none"
  • Must be "distinct=false"


#1 Updated by Tom Morris about 2 months ago

  • Subject changed from Multi-site workflow search to Multi-site object search
  • Description updated (diff)

#2 Updated by Peter Amstutz about 2 months ago


The main challenges are

  1. return responses as we get them (so one slow site doesn't slow down the whole response)
  2. stateless paging

Items in the response can only be globally ordered by waiting for and merging responses from every site. We can restrict/redefine ordering behavior, such as: objects on the same cluster will be returned sequentially and ordered internally, the client is expected to re-sort.

If we write the response json object incrementally (as remote responses come back) we could use streaming (event driven) parsing in the browser with

Note: writing incrementally probably means using chunked Transfer-Encoding. See also

On paging.

Given the ordering restriction described above, the client could use a recipe like:

For each sequence of items on the same cluster, look for the last one (as defined by a cut-over in the list from one uuid prefix to another, or the end of the list). Use each "final" item to construct a query that fetches the next page of items.

Note: we might not know that we have the "last" item if the following response is delayed, so we are delayed in seeing the cut-over. Maybe "items" should be a list of lists, so we see the termination of the list instead? Or maybe "items" should be a map of cluster → items.

Maybe a federated response special form:

  "kind": "arvados#federatedList",
  "parts": {
    "qr1hi": {
      "items": [...]
    "4xphq": {
      "items": [...]

With corresponding federated query?

filters: [
  ["any", "@@", "my query string"], 
    "parts": {
       "qr1hi": [["uuid", ">", "123"]], 
       "4xphq": [["uuid", ">", "456"]]

#3 Updated by Peter Amstutz about 2 months ago

  • Description updated (diff)

#4 Updated by Peter Amstutz about 2 months ago

  • Description updated (diff)

#5 Updated by Peter Amstutz about 2 months ago

Strike note-2 in favor of current description.

#6 Updated by Peter Amstutz about 2 months ago

  • Description updated (diff)
  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 3.0

Also available in: Atom PDF