Project

General

Profile

Actions

Bug #17589

closed

[arv-mount] hangs on ls keep/home/, slowly increasing memory

Added by Ward Vandewege over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
FUSE
Target version:
Story points:
-
Release relationship:
Auto

Description

This is happening on Tordo (AWS test cluster), at version 2.2.0~dev20210426141415-1. Tordo is in a LoginCluster federation with ce8i5 (Azure test cluster) as the login cluster.

As a superuser, on the tordo shell node, I mounted keep with `arv-mount keep --debug --foreground`. I then ran `ls keep/home`. Workbench2 says that there are 52074 items in my home project.

The `ls` command hangs indefinitely, with `arv-mount` doing seemingly nothing. The output from arv-mount is just this:

unique: 10, opcode: LOOKUP (1), nodeid: 1, insize: 45, pid: 29513
2021-04-27 01:35:09 arvados.arvados_fuse[29440] DEBUG: arv-mount lookup: parent_inode 1 name 'home' inode 5
   unique: 10, success, outsize: 144
unique: 11, opcode: LOOKUP (1), nodeid: 1, insize: 45, pid: 29513
2021-04-27 01:35:09 arvados.arvados_fuse[29440] DEBUG: arv-mount lookup: parent_inode 1 name 'home' inode 5
   unique: 11, success, outsize: 144
unique: 12, opcode: OPENDIR (27), nodeid: 5, insize: 48, pid: 29513
2021-04-27 01:35:09 arvados.arvados_fuse[29440] DEBUG: arv-mount opendir: inode 5
2021-04-27 01:35:09 arvados.arvados_fuse[29440] DEBUG: InodeCache touched inode 5 (size 0) (uuid ce8i5-tpzed-xo2k4i24bjzwedw) total now 0 (1 entries)
2021-04-27 01:35:09 arvados.arvados_fuse[29440] DEBUG: InodeCache touched inode 8 (size 0) (uuid ce8i5-tpzed-xo2k4i24bjzwedw) total now 0 (2 entries)

(no further output)

arv-mount slowly consumes more and more memory until the OOM killer takes it out.

While in this state, with `ls` hanging, accessing specific collections via `keep/by_id` works just fine.


Files

20210427-arv-mount-debug.log (3.5 MB) 20210427-arv-mount-debug.log Ward Vandewege, 04/27/2021 07:40 PM

Subtasks 3 (0 open3 closed)

Task #17659: Review 17589-group-contentsResolvedWard Vandewege05/11/2021Actions
Task #17661: Review 17589-fuse-use-keyset-paging ResolvedWard Vandewege05/11/2021Actions
Task #17672: Review 17589-contents-count-noneResolvedWard Vandewege05/11/2021Actions

Related issues

Blocks Arvados - Idea #17512: Release Arvados 2.2ResolvedPeter Amstutz05/03/2021Actions
Actions #1

Updated by Ward Vandewege over 3 years ago

  • Description updated (diff)
Actions #2

Updated by Ward Vandewege over 3 years ago

  • Category set to CWL
  • Description updated (diff)
Actions #3

Updated by Ward Vandewege over 3 years ago

  • Category changed from CWL to FUSE
Actions #4

Updated by Ward Vandewege over 3 years ago

The controller logs show a long list of group/contents requests, with an ever-increasing offset.

The manual query indicates that there should be 199695 items available.

ward@shell:~$ arv group contents -u ce8i5-tpzed-xo2k4i24bjzwedw -f '[["uuid","is_a","arvados#collection"]]'
...
 "items_available":199695,
 "kind":"arvados#objectList",
 "limit":100,
 "offset":0
}

But somehow arv-mount just keeps going, and it keeps getting data back:

...
Apr 27 01:51:30 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-wur4oz2c6j9gtne2oyzy","level":"info","msg":"request","remoteAddr":"127.0.0.1:60938","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=248000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T01:51:30.771006945Z"}
Apr 27 01:51:34 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-wur4oz2c6j9gtne2oyzy","level":"info","msg":"response","remoteAddr":"127.0.0.1:60938","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=248000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":979327,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T01:51:34.967312364Z","timeToStatus":4.196000,"timeTotal":4.196293,"timeWriteBody":0.000293}
Apr 27 01:51:34 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-ndhd2y6sbt70gf8rzr96","level":"info","msg":"request","remoteAddr":"127.0.0.1:60946","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=249000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T01:51:34.986794610Z"}
Apr 27 01:51:39 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-ndhd2y6sbt70gf8rzr96","level":"info","msg":"response","remoteAddr":"127.0.0.1:60946","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=249000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":978985,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T01:51:39.069045322Z","timeToStatus":4.081946,"timeTotal":4.082238,"timeWriteBody":0.000292}
...

Trying to reproduce this with the arv cli leads to a different request in the logs:

Apr 27 01:54:06 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-8l8zg5vzbsoqq0hhft2v","level":"info","msg":"request","remoteAddr":"127.0.0.1:32906","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"_profile=true\u0026bypass_federation=false\u0026cluster_id=ce8i5-tpzed-xo2k4i24bjzwedw\u0026cluster_id_given=true\u0026count=exact\u0026distinct=false\u0026filters=%5B%5B%22uuid%22%2C%22is_a%22%2C%22arvados%23collection%22%5D%5D\u0026filters_given=true\u0026help=false\u0026include=\u0026include_old_versions=false\u0026include_trash=false\u0026limit=9223372036854775807\u0026limit_given=true\u0026offset=249000\u0026offset_given=true\u0026order=\u0026recursive=false\u0026uuid=\u0026where=","time":"2021-04-27T01:54:06.414215153Z"}
Apr 27 01:54:06 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-8l8zg5vzbsoqq0hhft2v","level":"info","msg":"response","remoteAddr":"127.0.0.1:32906","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"_profile=true\u0026bypass_federation=false\u0026cluster_id=ce8i5-tpzed-xo2k4i24bjzwedw\u0026cluster_id_given=true\u0026count=exact\u0026distinct=false\u0026filters=%5B%5B%22uuid%22%2C%22is_a%22%2C%22arvados%23collection%22%5D%5D\u0026filters_given=true\u0026help=false\u0026include=\u0026include_old_versions=false\u0026include_trash=false\u0026limit=9223372036854775807\u0026limit_given=true\u0026offset=249000\u0026offset_given=true\u0026order=\u0026recursive=false\u0026uuid=\u0026where=","respBytes":109,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T01:54:06.918316347Z","timeToStatus":0.504083,"timeTotal":0.504092,"timeWriteBody":0.000009}

and also different results:

$ arv group contents -u ce8i5-tpzed-xo2k4i24bjzwedw -f '[["uuid","is_a","arvados#collection"]]' -s 249000 -l 9223372036854775807
{
 "included":[],
 "items":[],
 "items_available":199695,
 "kind":"arvados#objectList",
 "limit":1000,
 "offset":249000
}

This result is as expected; there is something about the arv-mount request that makes it misbehave.

Hmm. I recreated the arv mount request with curl, and that showed that the difference in request parameters results in a difference in items available:

curl -G -sH "Authorization: Bearer OMITTED" \
        https://tordo.arvadosapi.com/arvados/v1/groups/contents \
        --data-urlencode "offset=249000" \
        --data-urlencode "uuid=ce8i5-tpzed-xo2k4i24bjzwedw" \
        --data-urlencode 'filters=[["uuid","is_a","arvados#collection"]]' \
        --data-urlencode "limit=9223372036854775807" \
        --data-urlencode "alt=json" 

...
      "trash_at": null,
      "uuid": "ce8i5-4zz18-hkn8zfxr23dkw2l",
      "version": 1
    }
  ],
  "items_available": 410020,
  "kind": "arvados#objectList",
  "limit": 1000,
  "offset": 249000
}

I went back through the logs and noticed that my test from yesterday did stop after asking for the 410000 batch, and the response was small which lines up with it receiving the last 20 records:

Apr 27 02:32:33 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-4bump559x19zfo46cv2i","level":"info","msg":"request","remoteAddr":"127.0.0.1:37416","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=410000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T02:32:33.370600217Z"}
Apr 27 02:32:35 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-4bump559x19zfo46cv2i","level":"info","msg":"response","remoteAddr":"127.0.0.1:37416","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=410000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":18864,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T02:32:35.591036450Z","timeToStatus":2.220103,"timeTotal":2.220430,"timeWriteBody":0.000327}

So... maybe the groups endpoint is working as expected. Two questions remain:

a) Why is arv-mount doing a request that returns 410,020 items available, while the roughly equivalent version with `arv` returns 199,695 items?

b) why is arv-mount so terribly slow?

Tackling a)

Let's find out why the arv-mount call returns so many more results. Replicating it with a json request to mimic the `arv` command:

curl -G -sH "Authorization: Bearer OMITTED" \
    https://tordo.arvadosapi.com/arvados/v1/groups/contents \
    --data-urlencode "offset=249000" \
    --data-urlencode "uuid=ce8i5-tpzed-xo2k4i24bjzwedw" \
    --data-urlencode 'filters=[["uuid","is_a","arvados#collection"]]' \
    --data-urlencode "limit=9223372036854775807" \
    --data-urlencode "include_trash=false" \
    --data-urlencode "cluster_id=ce8i5-tpzed-xo2k4i24bjzwedw" \
    --data-urlencode "distinct=false" \
    --data-urlencode "include_old_versions=false" \
    --data-urlencode "recursive=false" \
    --data-urlencode "alt=json" 
Apr 27 13:20:17 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-18suuu9bl5ndjmmd55v9","level":"info","msg":"request","remoteAddr":"127.0.0.1:38400","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=249000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C%22is_a%22%2C%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026include_trash=false\u0026cluster_id=ce8i5-tpzed-xo2k4i24bjzwedw\u0026distinct=false\u0026include_old_versions=false\u0026recursive=false\u0026alt=json","time":"2021-04-27T13:20:17.329271974Z"}
Apr 27 13:20:19 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-18suuu9bl5ndjmmd55v9","level":"info","msg":"response","remoteAddr":"127.0.0.1:38400","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=249000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C%22is_a%22%2C%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026include_trash=false\u0026cluster_id=ce8i5-tpzed-xo2k4i24bjzwedw\u0026distinct=false\u0026include_old_versions=false\u0026recursive=false\u0026alt=json","respBytes":978985,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T13:20:19.839597233Z","timeToStatus":2.509989,"timeTotal":2.510312,"timeWriteBody":0.000323}

But that still yields 410,020 items available:

...
      "trash_at": null,
      "uuid": "ce8i5-4zz18-hkn8zfxr23dkw2l",
      "version": 1
    }
  ],
  "items_available": 410020,
  "kind": "arvados#objectList",
  "limit": 1000,
  "offset": 249000
}

Ah. The difference is that `arv_mount` sends a `uuid` parameter with the uuid of home project, whereas `arv group contents` sends `cluster_id` with that uuid value, and sends an empty `uuid` field:

curl -G -sH "Authorization: Bearer OMITTED" \
    https://tordo.arvadosapi.com/arvados/v1/groups/contents \
    --data-urlencode "offset=249000" \
    --data-urlencode "cluster_id=ce8i5-tpzed-xo2k4i24bjzwedw" \
    --data-urlencode 'filters=[["uuid","is_a","arvados#collection"]]' \
    --data-urlencode "limit=9223372036854775807" \
    --data-urlencode "include_trash=false" \
    --data-urlencode "cluster_id=ce8i5-tpzed-xo2k4i24bjzwedw" \
    --data-urlencode "distinct=false" \
    --data-urlencode "include_old_versions=false" \
    --data-urlencode "recursive=false" \
    --data-urlencode "alt=json" 
...
{
  "included": [],
  "items": [],
  "items_available": 199695,
  "kind": "arvados#objectList",
  "limit": 1000,
  "offset": 249000
}

Hrm, it looks like our gems on tordo are out of date:

ward@shell:~$ gem list |grep arv
arvados (2.0.4, 1.3.3.20190320201707)
arvados-cli (2.0.4)
arvados-google-api-client (0.8.7.4)
arvados-login-sync (2.0.4)

I fixed that:

shell.tordo:~# gem list|grep arv
arvados (2.1.2, 1.3.3.20190320201707)
arvados-cli (2.1.2)
arvados-google-api-client (0.8.7.4)
arvados-login-sync (2.1.2)

(the arvados gem version 1.3.3.20190320201707 is installed because of an outdated dependency in the arvados-login-sync gem, I just pushed a fix for that in 24b0875964b3eff98c12d1c135d8797efcfabfb2).

The updated version of `arv` doesn't make a difference though:

ward@shell:~$ arv group contents -u ce8i5-tpzed-xo2k4i24bjzwedw -f '[["uuid","is_a","arvados#collection"]]' |tail
   "trash_at":null,
   "uuid":"tordo-4zz18-13ok9k86ne7rpoo",
   "version":1
  }
 ],
 "items_available":199695,
 "kind":"arvados#objectList",
 "limit":100,
 "offset":0
}

Ah, well, duh, that was totally PEBKAC. The cli arguments for `arv` are not very intuitive. `-u` is for cluster_id and `--uuid` is what I want:

ward@shell:~$ arv group contents --uuid ce8i5-tpzed-xo2k4i24bjzwedw -f '[["uuid","is_a","arvados#collection"]]' |tail
   "trash_at":null,
   "uuid":"ce8i5-4zz18-qfyy9hyriqbwobm",
   "version":1
  }
 ],
 "items_available":410020,
 "kind":"arvados#objectList",
 "limit":100,
 "offset":0
}

Conclusion for a) is that the "approximate" equivalent `arv group contents` command was not equivalent enough. `arv-mount` is doing the right thing.


Tackling b) "why is arv-mount so slow"

It's getting 410,020 items to show. It's requesting 1,000 at a time. It's doing that serially. The controller logs show that the slowness is actually in getting the api server response:

Apr 27 18:04:21 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-zwrn6a8vovar784akvmp","level":"info","msg":"request","remoteAddr":"127.0.0.1:45342","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=115000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:04:21.061118586Z"}
Apr 27 18:04:27 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-zwrn6a8vovar784akvmp","level":"info","msg":"response","remoteAddr":"127.0.0.1:45342","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=115000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":978468,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:04:27.404367871Z","timeToStatus":6.342963,"timeTotal":6.343237,"timeWriteBody":0.000274}
Apr 27 18:04:27 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-gi4vdr6jjqw2posxa5sc","level":"info","msg":"request","remoteAddr":"127.0.0.1:45656","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=116000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:04:27.425531548Z"}
Apr 27 18:04:33 tordo.arvadosapi.com arvados-controller[6632]: {"PID":6632,"RequestID":"req-gi4vdr6jjqw2posxa5sc","level":"info","msg":"response","remoteAddr":"127.0.0.1:45656","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=116000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":979259,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:04:33.496201723Z","timeToStatus":6.070386,"timeTotal":6.070660,"timeWriteBody":0.000274}

The delay between request and response for the request with offset 115,000 is 6 seconds. The request for offset 116,000 follows in less than a second.

Hmm. It seems to get the list more than once?

Apr 27 18:22:35 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-ro0y1t0a4iaxgpqsaoba","level":"info","msg":"request","remoteAddr":"127.0.0.1:56958","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=409000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:22:35.305997716Z"}
Apr 27 18:22:39 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-ro0y1t0a4iaxgpqsaoba","level":"info","msg":"response","remoteAddr":"127.0.0.1:56958","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=409000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":951213,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:22:39.410794483Z","timeToStatus":4.101220,"timeTotal":4.104786,"timeWriteBody":0.003566}
Apr 27 18:22:39 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-1oug9ifewazn0k684q0g","level":"info","msg":"request","remoteAddr":"127.0.0.1:56990","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=410000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:22:39.425739879Z"}
Apr 27 18:22:41 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-1oug9ifewazn0k684q0g","level":"info","msg":"response","remoteAddr":"127.0.0.1:56990","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=410000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":293739,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:22:41.269223334Z","timeToStatus":1.843310,"timeTotal":1.843468,"timeWriteBody":0.000158}
Apr 27 18:23:29 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-8k0i9slizvrbrmw653uv","level":"info","msg":"request","remoteAddr":"127.0.0.1:57458","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/users/ce8i5-tpzed-xo2k4i24bjzwedw","reqQuery":"alt=json","time":"2021-04-27T18:23:29.707199973Z"}
Apr 27 18:23:29 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-8k0i9slizvrbrmw653uv","level":"info","msg":"response","remoteAddr":"127.0.0.1:57458","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/users/ce8i5-tpzed-xo2k4i24bjzwedw","reqQuery":"alt=json","respBytes":617,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:23:29.772947870Z","timeToStatus":0.065729,"timeTotal":0.065739,"timeWriteBody":0.000010}
Apr 27 18:23:29 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-w2f9j09wxy5tikzsqzeh","level":"info","msg":"request","remoteAddr":"127.0.0.1:57460","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23group%22%5D%2C+%5B%22groups.group_class%22%2C+%22in%22%2C+%5B%22project%22%2C+%22filter%22%5D%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:23:29.776418908Z"}
Apr 27 18:23:29 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-w2f9j09wxy5tikzsqzeh","level":"info","msg":"response","remoteAddr":"127.0.0.1:57460","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23group%22%5D%2C+%5B%22groups.group_class%22%2C+%22in%22%2C+%5B%22project%22%2C+%22filter%22%5D%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":2619,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:23:29.852613390Z","timeToStatus":0.076160,"timeTotal":0.076186,"timeWriteBody":0.000026}
Apr 27 18:23:29 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-n20y8tcbswoteyp5vl31","level":"info","msg":"request","remoteAddr":"127.0.0.1:57462","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:23:29.855664034Z"}
Apr 27 18:23:34 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-n20y8tcbswoteyp5vl31","level":"info","msg":"response","remoteAddr":"127.0.0.1:57462","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":1035365,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:23:34.883242160Z","timeToStatus":5.027091,"timeTotal":5.027561,"timeWriteBody":0.000470}
Apr 27 18:23:34 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-z5v3as9fh354p9e0h44m","level":"info","msg":"request","remoteAddr":"127.0.0.1:57540","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=1000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:23:34.901086368Z"}
Apr 27 18:23:38 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-z5v3as9fh354p9e0h44m","level":"info","msg":"response","remoteAddr":"127.0.0.1:57540","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=1000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":1034702,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:23:38.490090750Z","timeToStatus":3.588468,"timeTotal":3.588967,"timeWriteBody":0.000499}
Apr 27 18:23:38 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-9v2sx9irye3kt73lawp2","level":"info","msg":"request","remoteAddr":"127.0.0.1:57548","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=2000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:23:38.506781205Z"}
Apr 27 18:23:42 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-9v2sx9irye3kt73lawp2","level":"info","msg":"response","remoteAddr":"127.0.0.1:57548","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=2000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":1034106,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:23:42.873203307Z","timeToStatus":4.363532,"timeTotal":4.366400,"timeWriteBody":0.002869}

Wait, and again?

Apr 27 18:56:28 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-5ghnybu6c2338w40vp76","level":"info","msg":"request","remoteAddr":"127.0.0.1:40960","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=410000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:56:28.591156111Z"}
Apr 27 18:56:32 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-5ghnybu6c2338w40vp76","level":"info","msg":"response","remoteAddr":"127.0.0.1:40960","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=410000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":508538,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:56:32.610668168Z","timeToStatus":4.019278,"timeTotal":4.019497,"timeWriteBody":0.000219}
Apr 27 18:56:32 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-ls908p4sveqsmhbkvzsi","level":"info","msg":"request","remoteAddr":"127.0.0.1:41004","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=410537\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:56:32.624584396Z"}
Apr 27 18:56:35 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-ls908p4sveqsmhbkvzsi","level":"info","msg":"response","remoteAddr":"127.0.0.1:41004","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=410537\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":1039,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:56:35.356290294Z","timeToStatus":2.731689,"timeTotal":2.731699,"timeWriteBody":0.000010}
Apr 27 18:56:44 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-cb3u4lfpxn1szrj16adt","level":"info","msg":"request","remoteAddr":"127.0.0.1:41156","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/users/ce8i5-tpzed-xo2k4i24bjzwedw","reqQuery":"alt=json","time":"2021-04-27T18:56:44.479388280Z"}
Apr 27 18:56:44 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-cb3u4lfpxn1szrj16adt","level":"info","msg":"response","remoteAddr":"127.0.0.1:41156","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/users/ce8i5-tpzed-xo2k4i24bjzwedw","reqQuery":"alt=json","respBytes":617,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:56:44.534287931Z","timeToStatus":0.054884,"timeTotal":0.054893,"timeWriteBody":0.000009}
Apr 27 18:56:44 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-97hz7hlguiighxkuo707","level":"info","msg":"request","remoteAddr":"127.0.0.1:41158","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23group%22%5D%2C+%5B%22groups.group_class%22%2C+%22in%22%2C+%5B%22project%22%2C+%22filter%22%5D%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:56:44.537669320Z"}
Apr 27 18:56:44 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-97hz7hlguiighxkuo707","level":"info","msg":"response","remoteAddr":"127.0.0.1:41158","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23group%22%5D%2C+%5B%22groups.group_class%22%2C+%22in%22%2C+%5B%22project%22%2C+%22filter%22%5D%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":2619,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:56:44.595933349Z","timeToStatus":0.058234,"timeTotal":0.058256,"timeWriteBody":0.000023}
Apr 27 18:56:44 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-3b5s0uyy13qh8898c5kd","level":"info","msg":"request","remoteAddr":"127.0.0.1:41160","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:56:44.598972874Z"}
Apr 27 18:56:48 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-3b5s0uyy13qh8898c5kd","level":"info","msg":"response","remoteAddr":"127.0.0.1:41160","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":1034857,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:56:48.424530662Z","timeToStatus":3.825031,"timeTotal":3.825537,"timeWriteBody":0.000506}
Apr 27 18:56:48 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-z8b95n1s1hk0vvkzeyg5","level":"info","msg":"request","remoteAddr":"127.0.0.1:41162","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=1000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T18:56:48.438946395Z"}
Apr 27 18:56:52 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-z8b95n1s1hk0vvkzeyg5","level":"info","msg":"response","remoteAddr":"127.0.0.1:41162","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=1000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":1034602,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T18:56:52.048478363Z","timeToStatus":3.609040,"timeTotal":3.609521,"timeWriteBody":0.000481}

So there is really a bug here.

Again:

Apr 27 19:27:47 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-xli9lxyliy5spevpeyl8","level":"info","msg":"request","remoteAddr":"127.0.0.1:54620","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=409000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T19:27:47.240876489Z"}
Apr 27 19:27:52 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-xli9lxyliy5spevpeyl8","level":"info","msg":"response","remoteAddr":"127.0.0.1:54620","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=409000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":952550,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T19:27:52.033331703Z","timeToStatus":4.792080,"timeTotal":4.792440,"timeWriteBody":0.000360}
Apr 27 19:27:52 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-6hiwi1qyvfh34wlvmymp","level":"info","msg":"request","remoteAddr":"127.0.0.1:54640","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=410000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T19:27:52.063428278Z"}
Apr 27 19:27:55 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-6hiwi1qyvfh34wlvmymp","level":"info","msg":"response","remoteAddr":"127.0.0.1:54640","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=410000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":641514,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T19:27:55.974670620Z","timeToStatus":3.910995,"timeTotal":3.911229,"timeWriteBody":0.000234}
Apr 27 19:28:02 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-1j7nbd1ewfqhlgv2iovi","level":"info","msg":"request","remoteAddr":"127.0.0.1:54694","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/virtual_machines/tordo-2x53u-80c95rykdjur7t2/logins","reqQuery":"","time":"2021-04-27T19:28:02.995816068Z"}
Apr 27 19:28:03 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-1j7nbd1ewfqhlgv2iovi","level":"info","msg":"response","remoteAddr":"127.0.0.1:54694","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/virtual_machines/tordo-2x53u-80c95rykdjur7t2/logins","reqQuery":"","respBytes":3607,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T19:28:03.018414151Z","timeToStatus":0.022430,"timeTotal":0.022587,"timeWriteBody":0.000157}
Apr 27 19:28:06 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-dc19hj8glmvuftplt4mw","level":"info","msg":"request","remoteAddr":"127.0.0.1:54734","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/users/ce8i5-tpzed-xo2k4i24bjzwedw","reqQuery":"alt=json","time":"2021-04-27T19:28:06.884991519Z"}
Apr 27 19:28:06 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-dc19hj8glmvuftplt4mw","level":"info","msg":"response","remoteAddr":"127.0.0.1:54734","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/users/ce8i5-tpzed-xo2k4i24bjzwedw","reqQuery":"alt=json","respBytes":617,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T19:28:06.949804361Z","timeToStatus":0.064797,"timeTotal":0.064806,"timeWriteBody":0.000010}
Apr 27 19:28:06 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-azo21kkotxok4g3vcz5y","level":"info","msg":"request","remoteAddr":"127.0.0.1:54736","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23group%22%5D%2C+%5B%22groups.group_class%22%2C+%22in%22%2C+%5B%22project%22%2C+%22filter%22%5D%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T19:28:06.953170049Z"}
Apr 27 19:28:07 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-azo21kkotxok4g3vcz5y","level":"info","msg":"response","remoteAddr":"127.0.0.1:54736","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23group%22%5D%2C+%5B%22groups.group_class%22%2C+%22in%22%2C+%5B%22project%22%2C+%22filter%22%5D%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":2619,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T19:28:07.011272128Z","timeToStatus":0.058071,"timeTotal":0.058094,"timeWriteBody":0.000023}
Apr 27 19:28:07 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-mdcvwa12r64r1zsuvg07","level":"info","msg":"request","remoteAddr":"127.0.0.1:54738","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T19:28:07.014198930Z"}
Apr 27 19:28:10 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-mdcvwa12r64r1zsuvg07","level":"info","msg":"response","remoteAddr":"127.0.0.1:54738","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=0\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":1034420,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T19:28:10.760701354Z","timeToStatus":3.746157,"timeTotal":3.746486,"timeWriteBody":0.000329}
Apr 27 19:28:10 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-duri7xdwm0w19ta0uune","level":"info","msg":"request","remoteAddr":"127.0.0.1:54766","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=1000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","time":"2021-04-27T19:28:10.784075846Z"}
Apr 27 19:28:14 tordo.arvadosapi.com arvados-controller[32292]: {"PID":32292,"RequestID":"req-duri7xdwm0w19ta0uune","level":"info","msg":"response","remoteAddr":"127.0.0.1:54766","reqBytes":0,"reqForwardedFor":"10.253.0.197","reqHost":"tordo.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"offset=1000\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=9223372036854775807\u0026alt=json","respBytes":1034240,"respStatus":"OK","respStatusCode":200,"time":"2021-04-27T19:28:14.562016992Z","timeToStatus":3.777498,"timeTotal":3.777929,"timeWriteBody":0.000431}

The corresponding arv-mount debug log for that period is attached as 20210427-arv-mount-debug.log

Actions #5

Updated by Ward Vandewege over 3 years ago

  • Assigned To set to Ward Vandewege
  • Status changed from New to In Progress
Actions #7

Updated by Peter Amstutz over 3 years ago

  • Target version changed from 2021-04-28 bughunt sprint to 2021-05-12 sprint
Actions #8

Updated by Ward Vandewege over 3 years ago

Trying to track this down. On ce8i5 (where the bulk of the 400k+ collections are), I tried to use the experimental go fuse driver to list `home`.

This caused a api server rails process to misbehave in the group/contents endpoint: at ~100% of a core, it would consume more and more ram until the machine went OOM (or until I killed the process).

The rails log entry was

{"method":"GET","path":"/arvados/v1/groups/contents","format":"html","controller":"Arvados::V1::GroupsController","action":"contents","status":0,"duration":426683.8,"view":0.0,"db":13347.96,"request_id":"req-1ec6zkrd8szj47hjr6k2","client_ipaddr":"127.0.0.1","client_auth":"ce8i5-gj3su-a8lfaeci7rnrfxi","params":{"count":"none","filters":"[[\"uuid\",\"is_a\",\"arvados#collection\"]]","include":"","offset":"0","order":"[\"uuid\"]","uuid":"ce8i5-tpzed-xo2k4i24bjzwedw"},"@timestamp":"2021-05-04T13:11:51.441170640Z","@version":"1","message":"[0] GET /arvados/v1/groups/contents (Arvados::V1::GroupsController#contents)"}

and the corresponding controller entry:

May 04 13:09:44 api.ce8i5.arvadosapi.com arvados-controller[48567]: {"PID":48567,"RequestID":"req-1ec6zkrd8szj47hjr6k2","level":"info","msg":"response","remoteAddr":"127.0.0.1:47330","reqBytes":0,"reqForwardedFor":"52.177.187.26","reqHost":"ce8i5.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/ce8i5-tpzed-xo2k4i24bjzwedw/contents","reqQuery":"count=none\u0026filters=%5B%5B%22uuid%22%2C%22is_a%22%2C%22arvados%23collection%22%5D%5D\u0026order=uuid","respBody":"{\"errors\":[\"Get //railsapi.internalarvados/v1/groups/contents?count=none\\u0026filters=%5B%5B%22uuid%22%2C%22is_a%22%2C%22arvados%23collection%22%5D%5D\\u0026include=\\u0026offset=0\\u0026order=%5B%22uuid%22%5D\\u0026uuid=ce8i5-tpzed-xo2k4i24bjzwedw: context deadline exceeded\"]}\n","respBytes":275,"respStatus":"Internal Server Error","respStatusCode":500,"time":"2021-05-04T13:09:44.747807296Z","timeToStatus":300.000900,"timeTotal":300.000940,"timeWriteBody":0.000040}

On the client, this caused:

ward@shell:~$ ls ackeep/home/
ls: reading directory 'ackeep/home/': Input/output error
ward@shell:~$

In other words: ls didn't hang, and the `arvados-client mount` command didn't hang. Nice. The fuse driver process was still working fine, after this it responded to other requests normally. Very good.

Actions #9

Updated by Peter Amstutz over 3 years ago

Actions #10

Updated by Peter Amstutz over 3 years ago

This seems wrong:

arv group contents -u ce8i5-tpzed-xo2k4i24bjzwedw -f '[["uuid","is_a","arvados#collection"]]'
 "items_available":212278,
 "kind":"arvados#objectList",
 "limit":100,
 "offset":0
arv collection list -f '[["owner_uuid", "=", "ce8i5-tpzed-xo2k4i24bjzwedw"]]'
 "items_available":210564,
 "kind":"arvados#collectionList",
 "limit":100,
 "offset":0
Actions #11

Updated by Peter Amstutz over 3 years ago

Default sorting is different. Hm.

$ arv --format=uuid collection list -f '[["owner_uuid", "=", "ce8i5-tpzed-xo2k4i24bjzwedw"]]' -l 10
tordo-4zz18-imj9n6fr54r0lmn
tordo-4zz18-7ww56ff272db5ic
tordo-4zz18-2yg8wum4runsoi7
tordo-4zz18-76xymxrcvegoz87
tordo-4zz18-0uankid7xv4wwem
tordo-4zz18-sjfzx7pwqfj8jfq
tordo-4zz18-6yr13gnvrpapo2s
tordo-4zz18-sw0a55lm3qc009k
tordo-4zz18-fk1k7siayjg9h5a
tordo-4zz18-gzo062wzhhy8igf

$ arv --format=uuid group contents -u ce8i5-tpzed-xo2k4i24bjzwedw -f '[["uuid","is_a","arvados#collection"]]' -l 10
tordo-4zz18-dale5t56h78135h
tordo-4zz18-swrtkbdkctrk4af
tordo-4zz18-ubfy3xu734t7ilc
tordo-4zz18-imj9n6fr54r0lmn
tordo-4zz18-7ww56ff272db5ic
tordo-4zz18-2yg8wum4runsoi7
tordo-4zz18-76xymxrcvegoz87
tordo-4zz18-0uankid7xv4wwem
tordo-4zz18-sjfzx7pwqfj8jfq
tordo-4zz18-6yr13gnvrpapo2s
Actions #12

Updated by Peter Amstutz over 3 years ago

Nope I made the exact same mistake as you, it is --uuid not -u

But still getting an unexpected result:

$ arv group contents --uuid ce8i5-tpzed-xo2k4i24bjzwedw -f '[["uuid","is_a","arvados#collection"]]'
 "items_available":421922,

$ arv collection list -f '[["owner_uuid", "=", "ce8i5-tpzed-xo2k4i24bjzwedw"]]'
 "items_available":210776,
Actions #13

Updated by Peter Amstutz over 3 years ago

For me, using Workbench 2 to get the home project contents also fails.

It took me much longer to figure this out than it should, but what seems to be happening is that we're getting "Oops! Federation!"

$ echo $ARVADOS_API_HOST
tordo.arvadosapi.com

$ arv --format=uuid group contents --uuid ce8i5-tpzed-xo2k4i24bjzwedw -f '[["uuid","is_a","arvados#collection"]]'
ce8i5-4zz18-z7cuqcr29f2nwv0
ce8i5-4zz18-d1fn4x60badlxgs
ce8i5-4zz18-f1z6oh1aztmqqia
ce8i5-4zz18-mhae76913cys0kr
...

I don't know exactly what controller is doing but presumably it is looking at the ce8i5 user uuid prefix and redirecting the request to ce8i5.

The original problem, where the listing never finishes, could possibly be due to items being added between requests, so the "items_available" target keeps rising.

Possibly useful, I added "arvados.util.keyset_list_all" which uses a different paging strategy than "list_all", which is much more gentle on the database, so large listings may complete faster.

With the "keyset_list_all" approach, you can use decending order by time, so it doesn't get stuck or mess up when things are added while it is trying to list everything.

Actions #14

Updated by Peter Amstutz over 3 years ago

func (conn *Conn) GroupContents(ctx context.Context, options arvados.GroupContentsOptions) (arvados.ObjectList, error) {
    return conn.chooseBackend(options.UUID).GroupContents(ctx, options)
}

This (a) needs to special case federated users and default to local and (b) needs to accept "cluster_id" for the case where you actually do want to get the home contents of a user on a different cluster.

Actions #15

Updated by Peter Amstutz over 3 years ago

  • Assigned To changed from Ward Vandewege to Peter Amstutz
Actions #16

Updated by Peter Amstutz over 3 years ago

17589-group-contents @ 19e9af391ebd6766200bc76c9d1ee90f167e43a7

Fix the group contents endpoint so that requests for user home directory go to local. Add test.

17589-fuse-use-keyset-paging @ fc0e9f360dd51c5c508426846f2caed9f1c2ec66 (includes the previous branch)

Switch fusedir.py to use keyset_list_all() which uses a more efficient and stable paging mechanism than list_all().

developer-run-tests: #2470

Actions #17

Updated by Peter Amstutz over 3 years ago

It seems that there's a problem in the underlying Ruby code for group contents. When we try to do a query that uses count=none and keyset paging (Go FUSE, and now arv-mount FUSE), the API server apparently OOMs and needs to be rebooted.

Actions #18

Updated by Peter Amstutz over 3 years ago

  1. Today I learned that group contents has an undocumented parameter called "last_object_class" which seems to provide a workaround to the paging problem where results are ordered within each object type (table) but not sorted across all results.
  2. even on arvbox, there is a dramatic request time difference between "group contents --count=exact" (150ms) and "group contents --count=none" (2500ms). This is very weird since "count=none" is intended to be an optimization.
Actions #19

Updated by Peter Amstutz over 3 years ago

  • Target version changed from 2021-05-12 sprint to 2021-05-26 sprint
Actions #20

Updated by Peter Amstutz over 3 years ago

  • Target version deleted (2021-05-26 sprint)
        # The call to object_list below will not populate :items_available in
        # its response, because count is disabled.  Save @objects length (does
        # not require another db query) so that @offset (if set) is handled
        # correctly.
        countless_items_available = @objects.length

This comment is wrong. This does does require a db query. Calling @objects.length involves doing a query of the entire table, with all the joins, and no limit.

We don't want to do this, since it is functionally the same as the count=exact behavior. It is also completely unnecessary when offset=0.

On discussion with Tom, we decided that for count=none and nonzero offset, it can just fall back on the count=exact behavior.

17589-contents-count-none @ 9003633889b26bdd19c1376c69981aaaa21ee819

developer-run-tests: #2471

Actions #21

Updated by Peter Amstutz over 3 years ago

  • Target version set to 2021-05-26 sprint
Actions #22

Updated by Ward Vandewege over 3 years ago

Peter Amstutz wrote:

[...]

This comment is wrong. This does does require a db query. Calling @objects.length involves doing a query of the entire table, with all the joins, and no limit.

Yeah, clearly I made a mistake, it's odd, I recall testing this and looking at the database queries.

We don't want to do this, since it is functionally the same as the count=exact behavior. It is also completely unnecessary when offset=0.

On discussion with Tom, we decided that for count=none and nonzero offset, it can just fall back on the count=exact behavior.

Yeah, that seems right.

17589-contents-count-none @ 9003633889b26bdd19c1376c69981aaaa21ee819

LGTM. This seems to include the 17589-group-contents and 17589-fuse-use-keyset-paging branches from https://dev.arvados.org/issues/17589#note-16 so those are good to merge too.

Actions #23

Updated by Peter Amstutz over 3 years ago

  • Status changed from In Progress to Resolved
Actions #24

Updated by Peter Amstutz over 3 years ago

  • Release set to 38
Actions #25

Updated by Peter Amstutz over 3 years ago

  • Status changed from Resolved to Feedback

Did some testing after deployment to ce8i5.

I was able to list Ward's home project without crashing anything, but it took 58 minutes!

$ time ls | wc
426295 2094831 24104847

real 58m18.061s
user 0m5.189s
sys 0m0.163s

That means it had to do 427 API requests. Which is quite a lot, but something else is going on here. For some reason, even with count=none, it is very expensive to fetch each page:

{"PID":3703,"RequestID":"req-6zpplgutxcdq5cevhwj5","level":"info","msg":"response","remoteAddr":"127.0.0.1:49712","reqBytes":0,"reqForwardedFor":"67.189.170.63","reqHost":"ce8i5.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/groups/contents","reqQuery":"uuid=ce8i5-tpzed-xo2k4i24bjzwedw\u0026filters=%5B%5B%22uuid%22%2C+%22%3E%3D%22%2C+%22ce8i5-4zz18-cma8av9gjz7orlg%22%5D%2C+%5B%22uuid%22%2C+%22%21%3D%22%2C+%22ce8i5-4zz18-cma8av9gjz7orlg%22%5D%2C+%5B%22uuid%22%2C+%22is_a%22%2C+%22arvados%23collection%22%5D%5D\u0026limit=1000\u0026count=none\u0026order=%5B%22uuid+asc%22%2C+%22uuid+asc%22%5D\u0026alt=json","respBytes":977268,"respStatus":"OK","respStatusCode":200,"time":"2021-05-13T17:25:49.144549510Z","timeToStatus":5.606698,"timeTotal":5.616120,"timeWriteBody":0.009422}

Need to get to the bottom of this.

Actions #26

Updated by Peter Amstutz over 3 years ago

$ time arv group contents --filters 'ce8i5-4zz18-cma8av9gjz7orlg%22%5D_%5B%22uuid%22_%22%3E=%22_%22ce8i5-4zz18-cma8av9gjz7orlg%22%5D_%5B%22uuid%22_%22is_a%22_%22arvados#collection">"uuid", "!=", "ce8i5-4zz18-cma8av9gjz7orlg"], ["uuid", ">=", "ce8i5-4zz18-cma8av9gjz7orlg"], ["uuid", "is_a", "arvados' --order "uuid asc, uuid asc" --count=none --limit 100

real 0m1.256s
user 0m0.240s
sys 0m0.017s

$ time arv group contents --filters 'ce8i5-4zz18-cma8av9gjz7orlg%22%5D_%5B%22uuid%22_%22%3E=%22_%22ce8i5-4zz18-cma8av9gjz7orlg%22%5D_%5B%22uuid%22_%22is_a%22_%22arvados#collection">"uuid", "!=", "ce8i5-4zz18-cma8av9gjz7orlg"], ["uuid", ">=", "ce8i5-4zz18-cma8av9gjz7orlg"], ["uuid", "is_a", "arvados' --order "uuid asc, uuid asc" --count=none --limit 1000
real 0m10.880s
user 0m0.300s
sys 0m0.069s

Using count=none is an improvement over count=exact:

$ time arv group contents --filters 'ce8i5-4zz18-cma8av9gjz7orlg%22%5D_%5B%22uuid%22_%22%3E=%22_%22ce8i5-4zz18-cma8av9gjz7orlg%22%5D_%5B%22uuid%22_%22is_a%22_%22arvados#collection">"uuid", "!=", "ce8i5-4zz18-cma8av9gjz7orlg"], ["uuid", ">=", "ce8i5-4zz18-cma8av9gjz7orlg"], ["uuid", "is_a", "arvados' --order "uuid asc, uuid asc" --count=exact --limit 100
real 0m2.862s
user 0m0.244s
sys 0m0.028s

Actions #27

Updated by Peter Amstutz over 3 years ago

I don't know what exactly it is doing, but retrieving 1000 rows takes 57ms of database time, but the whole request takes 2695ms because of whatever it is doing in Ruby:

{"method":"GET","path":"/arvados/v1/groups/contents","format":"html","controller":"Arvados::V1::GroupsController","action":"contents","status":200,"duration":2695.01,"v
iew":0.39,"db":56.99,"request_id":"req-da7oqupeflmrlipsjsp2","client_ipaddr":"127.0.0.1","client_auth":"ce8i5-gj3su-b8imt9ab31yacmz","params":{"cluster_id":"","count":"
none","filters":"ce8i5-4zz18-zschyoep2a1em49%5C%22%5D%5B%5C%22uuid%5C%22%5C%22!=%5C%22%5C%22ce8i5-4zz18-zschyoep2a1em49%5C%22%5D%5B%5C%22uuid%5C%22%5C%22is_a%5C%22%5C%22arvados#collection">\"uuid\",\"\\u003e=\",\"ce8i5-4zz18-zschyoep2a1em49\"],[\"uuid\",\"!=\",\"ce8i5-4zz18-zschyoep2a1em49\"],[\"uuid\",\"is_a\",\"arvados"
,"include":"","limit":"1000","offset":"0","order":"[\"uuid asc\",\"uuid asc\"]","uuid":"ce8i5-tpzed-xo2k4i24bjzwedw"},"@timestamp":"2021-05-13T17:49:38.262840464Z","@ve
rsion":"1","message":"[200] GET /arvados/v1/groups/contents (Arvados::V1::GroupsController#contents)"}

Actions #28

Updated by Peter Amstutz over 3 years ago

Peter Amstutz wrote:

I don't know what exactly it is doing, but retrieving 1000 rows takes 57ms of database time, but the whole request takes 2695ms because of whatever it is doing in Ruby:

{"method":"GET","path":"/arvados/v1/groups/contents","format":"html","controller":"Arvados::V1::GroupsController","action":"contents","status":200,"duration":2695.01,"v
iew":0.39,"db":56.99,"request_id":"req-da7oqupeflmrlipsjsp2","client_ipaddr":"127.0.0.1","client_auth":"ce8i5-gj3su-b8imt9ab31yacmz","params":{"cluster_id":"","count":"
none","filters":"ce8i5-4zz18-zschyoep2a1em49%5C%22%5D%5B%5C%22uuid%5C%22%5C%22!=%5C%22%5C%22ce8i5-4zz18-zschyoep2a1em49%5C%22%5D%5B%5C%22uuid%5C%22%5C%22is_a%5C%22%5C%22arvados#collection">\"uuid\",\"\\u003e=\",\"ce8i5-4zz18-zschyoep2a1em49\"],[\"uuid\",\"!=\",\"ce8i5-4zz18-zschyoep2a1em49\"],[\"uuid\",\"is_a\",\"arvados"
,"include":"","limit":"1000","offset":"0","order":"[\"uuid asc\",\"uuid asc\"]","uuid":"ce8i5-tpzed-xo2k4i24bjzwedw"},"@timestamp":"2021-05-13T17:49:38.262840464Z","@ve
rsion":"1","message":"[200] GET /arvados/v1/groups/contents (Arvados::V1::GroupsController#contents)"}

I just realized this is probably a silly test case because the uuid starts with "zs..." and probably is close to the last one in the database.

I have moved this discussion over to https://dev.arvados.org/issues/17674

Actions #29

Updated by Peter Amstutz over 3 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF