Bug #11158

[FUSE] Hang on simple FUSE operation and when logging in again later

Added by Tom Morris 6 months ago. Updated 21 days ago.

Status:ResolvedStart date:02/24/2017
Priority:NormalDue date:
Assignee:Peter Amstutz% Done:

100%

Category:Performance
Target version:2017-08-02 sprint
Story points-Remaining (hours)0.00 hour
Velocity based estimate-

Description

I changed directory on a shell node to a ~/keep/home/... project which each had a single sub project, which each had a single collection, which each had the file I was looking for one directory down and did:

ls */*/*/SampleSheet.csv

which should have in <1 second returned the ~20 files that I was looking for. Instead, it went away for hours, and when I lost the SSH connection and tried to log in again later, the login hung trying to do the arv-mount.


Subtasks

Task #11191: Review 11158-fuse-projectsResolvedTom Clegg

Associated revisions

Revision 3ad999b1
Added by Peter Amstutz 24 days ago

Merge branch '11158-fuse-projects' refs #11158

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Peter Amstutz 6 months ago

  • Assignee set to Peter Amstutz

#2 Updated by Peter Amstutz 5 months ago

  • Target version changed from 2017-03-15 sprint to 2017-03-29 sprint

#3 Updated by Peter Amstutz 5 months ago

  • Target version changed from 2017-03-29 sprint to 2017-04-12 sprint

#4 Updated by Peter Amstutz 4 months ago

  • Target version changed from 2017-04-12 sprint to 2017-04-26 sprint

#5 Updated by Tom Morris 4 months ago

  • Target version changed from 2017-04-26 sprint to 2017-05-10 sprint

#6 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2017-05-10 sprint to 2017-05-24 sprint

#7 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2017-05-24 sprint to Arvados Future Sprints

#8 Updated by Peter Amstutz about 1 month ago

  • Status changed from New to In Progress
  • Target version changed from Arvados Future Sprints to 2017-07-05 sprint

#9 Updated by Peter Amstutz about 1 month ago

New behavior for project directories. Allow 'cd' into any collection or subproject that exists on the server, even if it isn't returned in the listing. Now applies listing limit, ordered by most recent. No longer lists other object types (pipelines, links etc.)

The problem seems to be that listing the entire contents of the project via the "group contents" API gets stuck. There may be a bug in the interaction of paging via "offset" with the "contents" API.

#10 Updated by Tom Clegg about 1 month ago

looking at 11158-fuse-projects @ 62f1bc5...

It looks like list_all(..., limit=1000) makes list_all() ask for 1000 items per page, instead of the apiserver's maximum page size. So, either the apiserver's limit is 1000 or less and limit=1000 makes no difference, or the apiserver's limit is higher than 1000 and limit=1000 fetches the same number of records (all of them) but uses more API calls because it gets fewer results per page. Am I following this correctly? I expect this will make arv-mount slower -- what's the advantage?

Many tests are failing for me. Most of them are "ERROR" and look like they might be unrelated "fuse is having a bad day on my machine today" things, but this looks like the sort of thing that would be caused by the "don't list_all() until someone calls readdir" change:

======================================================================
FAIL: runTest (tests.test_mount.FuseSharedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tom/src/arvados/services/fuse/tests/test_mount.py", line 244, in runTest
    ], fuse_user_objs)
AssertionError: Lists differ: ['FUSE Test Project', 'collect... != ['FUSE Test Project', 'collect...

First list contains 1 additional elements.
First extra element 3:
pipeline instance owned by FUSE.pipelineInstance

  ['FUSE Test Project',
   'collection #1 owned by FUSE',
-  'collection #2 owned by FUSE',
?                               ^

+  'collection #2 owned by FUSE']
?                               ^

-  'pipeline instance owned by FUSE.pipelineInstance']

The list_all() bugfix looks good, and seems convincing as a fix for the reported issue.

#11 Updated by Peter Amstutz about 1 month ago

  • Target version changed from 2017-07-05 sprint to 2017-07-19 sprint

#12 Updated by Peter Amstutz about 1 month ago

Sorry, that branch was a little half baked. Tests should pass now.

Now @ b25cbdbd1b54a622f4428e29ccffeb165ef29b45

#13 Updated by Tom Clegg about 1 month ago

In this version of list_all(), the "filters" logic assumes the response is ordered by uuid, but it requests whatever order was given by the caller. Won't it miss a lot of records if called with an order other than uuid? It might also behave badly if the returned items don't include a uuid (e.g., "select" param given).

This branch seems to be taking a turn toward "improve various things in arv-mount and its tests" -- or maybe I just don't see how a lot of the changes are related to the bugfix?

If I'm following correctly:
  • The bug ("hang on simple FUSE operation") was caused by the groups#contents API returning the wrong (too high) number for items_available, and list_all() being unable to terminate its loop as a result.
  • We can make list_all() faster, and work around the groups#contents bug at the same time, by sending count=none and terminating the loop on "returned items < returned limit" instead.
  • We haven't explained or reproduced the groups#contents bug.
  • There's a bunch of other work to do in arv-mount.

Assuming this is correct, could we focus on merging a list_all() improvement/workaround, and put the other changes (like lazy loading) aside for a moment? In particular I think it would be good to track down the API bug.

#14 Updated by Peter Amstutz about 1 month ago

I'll split the Python SDK and FUSE changes into separate branches.

#15 Updated by Peter Amstutz about 1 month ago

items_available as reported:

 
$ arv group contents --uuid 34t0i-tpzed-1jdzge3ezjpou1o
"items_available":16023,

Paging, using [uuid > last uuid on previous page] and stopping when getting an incomplete page:

val = arvados.util.list_all(api.groups().contents, 0, uuid="34t0i-tpzed-1jdzge3ezjpou1o")
>>> len(val)
6146

Paging, using [uuid > last uuid on previous page] and stopping when getting an empty page:

val = arvados.util.list_all(api.groups().contents, 0, uuid="34t0i-tpzed-1jdzge3ezjpou1o")
>>> len(val)
6624

Paging, using offset = result["offset"] + len(result["items"]) and count=none, and stopping after getting an empty page:

>>> val = arvados.util.list_all(api.groups().contents, 0, uuid="34t0i-tpzed-1jdzge3ezjpou1o")
{'count': 'none', 'offset': 0, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 0, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 1000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 1000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 2000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 2000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 3000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 3000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 4000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 4000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 5000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 5000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 6000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 6000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 7000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 7000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 8000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 8000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 9000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 9000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 10000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 10000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 11000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 11000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 12000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 12000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 13000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 13000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 14000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 14000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 15000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 15000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 16000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 16000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 17000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 17000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 18000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 18000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 19000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 19000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 20000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 20000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 21000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 21000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 22000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 22000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 23000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 23000, u'self_link': u'', u'items_available': 0}
{'count': 'none', 'offset': 24000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
(continues forever)

Paging, using offset = result["offset"] + len(result["items"]) and count=exact, and stopping after getting an empty page:

>>> val = arvados.util.list_all(api.groups().contents, 0, uuid="34t0i-tpzed-1jdzge3ezjpou1o")
{'count': 'exact', 'offset': 0, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 0, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 1000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 1000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 2000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 2000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 3000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 3000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 4000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 4000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 5000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 5000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 6000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 6000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 7000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 7000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 8000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 8000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 9000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 9000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 10000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 10000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 11000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 11000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 12000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 12000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 13000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 13000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 14000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 14000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 15000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 15000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 16000, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 16000, u'self_link': u'', u'items_available': 16023}
{'count': 'exact', 'offset': 16023, 'limit': 9223372036854775807, 'uuid': '34t0i-tpzed-1jdzge3ezjpou1o', 'order': 'uuid asc'}
{u'kind': u'arvados#objectList', u'etag': u'', u'limit': 1000, u'offset': 16023, u'self_link': u'', u'items_available': 16023}
>>> len(val)
16023

Findings:

  • Paging using uuid doesn't work at all group#contents
  • group#contents with offset requires count=exact
  • Doing some tracing, it appears that it is actually calling list_all() over and over again.

#16 Updated by Peter Amstutz about 1 month ago

On further examination, it seems what is actually happening is that it is getting a steady stream of events from websockets, which are handled by invalidating the whole project directory (which sets the "stale" flag). As a result, every access to the ProjectDirectory sees the "stale" flag and triggers a full reload (which takes ~60s).

Disabling event listening (--disable-event-listening) makes it possible to enter and list home directory contents.

The fix is for the ProjectDirectory class to handle incremental updates.

#17 Updated by Tom Morris about 1 month ago

  • Target version changed from 2017-07-19 sprint to 2017-08-02 sprint

#18 Updated by Peter Amstutz 25 days ago

11158-fuse-projects 26e095318c7c368089ea368e6e4854b62a6c7830

Based on findings in note 16:

  • Reverted changes to util.py
  • Implemented incremental update of project contents from on websocket events

#19 Updated by Tom Clegg 24 days ago

LGTM

Happened to notice "if new_name" should probably also be changed to "if new_name is not None" here in fusedir.py, just in case someone has renamed something to "". Except that I suppose renaming to "" is as good as deleting, in the context of a fuse mount.

             if new_name:
-                if ent:
+                if ent is not None:
                     self._entries[new_name] = ent
                 else:
                     self._add_entry(new_attrs, new_name)

#20 Updated by Peter Amstutz 21 days ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF