Bug #8624

[FUSE] arv-mount `by_tag` directory only shows 100 tags and tags that exist are not accessible

Added by Joshua Randall about 1 year ago. Updated 25 days ago.

Status:NewStart date:03/03/2016
Priority:NormalDue date:
Assignee:Peter Amstutz% Done:


Target version:Arvados Future Sprints
Story points-
Velocity based estimate-


When I look in the `by_tag` directory under a keep mount, I only see 100 tags in the directory.

# arv-mount /tmp/keep
# ls /tmp/keep/by_tag | wc -l

However, I actually have 54225 tags:

# arv link list -f '[["head_kind","=","arvados#collection"],["link_class","=","tag"]]' -l 0

I could see how it might be reasonable to limit the number of tags shown (or even not to allow listing until they are accessed, as in the `by_id` directory).

However, that doesn't seem to be what is happening, as tags that exist are not accessible:

# ls /tmp/keep/by_tag/lanelet:17559_7#3
ls: cannot access /tmp/keep/by_tag/lanelet:17559_7#3: No such file or directory


#1 Updated by Joshua Randall about 1 year ago

No big mystery here - the API call to list the links does not specify a limit and does not check items_available, but seems to just assume the API server returned everything: https://github.com/curoverse/arvados/blob/master/services/fuse/arvados_fuse/fusedir.py#L671-L679

#2 Updated by Joshua Randall about 1 year ago

I've confirmed that adding `limit=1000` to the API list query (fusedir.py:673) results in a `by_tag` directory with 1000 entries. So, one aspect to the solution to this would be to implement a loop that retrieves all available tags (although in my opinion it would be better to do that within the python SDK a la #8502 as it is common pattern that is needed often).

<                 select=['name'], distinct=True
>                 select=['name'], distinct=True, limit=1000

I also checked differences in memory usage with a limit of 100 vs 1000 (because it looks like TagsDirectory creates a TagDirectory object for each tag and I wasn't sure how expensive those are).

# arv-mount /tmp/keep_root
# cat /proc/$(ps auxwww |grep keep_root | grep -v grep | awk '{print $2}')/status | grep '^VmRSS'
VmRSS:     31604 kB
# ls /tmp/keep_root/by_tag | wc -l
# cat /proc/$(ps auxwww |grep keep_root | grep -v grep | awk '{print $2}')/status | grep '^VmRSS'
VmRSS:     35156 kB
# umount /tmp/keep_root
# # change to limit=1000
# arv-mount /tmp/keep_root
# cat /proc/$(ps auxwww |grep keep_root | grep -v grep | awk '{print $2}')/status | grep '^VmRSS'
VmRSS:     31592 kB
# ls /tmp/keep_root/by_tag | wc -l
# cat /proc/$(ps auxwww |grep keep_root | grep -v grep | awk '{print $2}')/status | grep '^VmRSS'
VmRSS:     36984 kB

100 tag limit
35156 - 31604 kB = 3552 kB

1000 tag limit
36984 - 31592 kB = 5392 kB

Expected memory cost for each additional tag: (5392 - 3552 kB) / (1000-100) tags = 2.04 kB/tag

Given this, it seems reasonable to retain the current behavior of pre-populating TagDirectory objects unless there are more than in the vicinity of 100000-10000000 tags (at the 1m tag mark, the expected memory usage for arv-mount would be ~2GB, which seems excessive for a client that isn't likely to actually need 99.999% of those objects). I'm not understandingwhy the TagDirectory object is not simply created if and when a request is made for it. I think the only useful piece of state that it has is `self.tag` which is the name of the tag to use in the API query to get the associated collections. Since that is equal to the `name` that would be passed to the `lookup` function prior to access, I do not see any point in prepopulating those objects - it would be just as fast to have a general TagDirectory object that (like MagicDirectory) waits for the `lookup` call to do the actual API server query and prepare a list of associated CollectionDirectory objects (which also don't seem like they need to be prepopulated).

#3 Updated by Joshua Randall about 1 year ago

I would also note that it appears that the list of collections that a tag is associated with is also subject to the same issue regarding only doing a single API list query and thus it would also only show the first 100 collections associated with a particular tag.

#4 Updated by Joshua Randall about 1 year ago

  • Assignee set to Joshua Randall

I've implemented the basic fix for this (retrieving items across many batches using a generator function).

However, it makes the first `ls` of the `by_tag` directory rather slow - it is now taking ~50s to list the 31749 distinct tag names on the first access, then ~1s on a subsequent access. Unfortunately, because of the poll_time=60 default, those results only stay cached for 60s after which the next query goes back to taking a long time.

# arv-mount /tmp/keep_root
# time ls /tmp/keep_root/by_tag | wc -l

real    0m48.103s
user    0m0.292s
sys     0m0.015s
# sleep 30 && time ls /tmp/keep_root/by_tag | wc -l

real    0m0.735s
user    0m0.304s
sys     0m0.011s
# sleep 30 && time ls /tmp/keep_root/by_tag | wc -l

real    0m47.858s
user    0m0.324s
sys     0m0.018s

This wouldn't matter too much if I could just skip the `ls` and go straight to a tag subdirectory, but unfortunately the implementation makes that just as slow:

# arv-mount /tmp/keep_root
# time ls /tmp/keep_root/by_tag/lanelet\:16261_1#1

real    0m47.876s
user    0m0.000s
sys     0m0.002s

There is no good reason why an attempt to access a particular subdirectory should require first obtaining a full listing of the parent directory.

Also, FWIW, I'm starting to think that there is a fundamental problem with the API server limiting the items returned to a number as "low" as 1000. In this particular query, the payload size of all 31749 entries would be ~750kB, whereas some other individual items the API server returns (such as collections with large manifests) can be 20 times that size. Why make me do 32 queries (taking 1-2s each) to get my 750kB while allowing a single query to return 20x more data?

Now that the API server has some logic to return a sensible amount of data (rather than number of entries), perhaps the upper limit on batch size should be raised significantly (perhaps to 1m)?

#5 Updated by Joshua Randall about 1 year ago

IMHO clients shouldn't really be using polling to check if there is new data - ideally some mechanism (such as etag) would allow a client to ask the API server if the resource it is accessing has changed without having to actually perform a query at all.

#6 Updated by Tom Morris 25 days ago

  • Subject changed from arv-mount `by_tag` directory only shows 100 tags and tags that exist are not accessible to [FUSE] arv-mount `by_tag` directory only shows 100 tags and tags that exist are not accessible
  • Assignee changed from Joshua Randall to Peter Amstutz
  • Target version set to Arvados Future Sprints

Assigning to Peter for triage/grooming

Also available in: Atom PDF