Bug #8624

[FUSE] arv-mount `by_tag` directory only shows 100 tags and tags that exist are not accessible

Added by Joshua Randall over 1 year ago. Updated 4 days ago.

Status:NewStart date:03/03/2016
Priority:NormalDue date:
Assignee:Peter Amstutz% Done:


Target version:2017-07-05 sprint
Story points-Remaining (hours)0.00 hour
Velocity based estimate-


When I look in the `by_tag` directory under a keep mount, I only see 100 tags in the directory.

# arv-mount /tmp/keep
# ls /tmp/keep/by_tag | wc -l

However, I actually have 54225 tags:

# arv link list -f '[["head_kind","=","arvados#collection"],["link_class","=","tag"]]' -l 0

I could see how it might be reasonable to limit the number of tags shown (or even not to allow listing until they are accessed, as in the `by_id` directory).

However, that doesn't seem to be what is happening, as tags that exist are not accessible:

# ls /tmp/keep/by_tag/lanelet:17559_7#3
ls: cannot access /tmp/keep/by_tag/lanelet:17559_7#3: No such file or directory


Task #11891: ReviewNewRadhika Chippada


#1 Updated by Joshua Randall over 1 year ago

No big mystery here - the API call to list the links does not specify a limit and does not check items_available, but seems to just assume the API server returned everything: https://github.com/curoverse/arvados/blob/master/services/fuse/arvados_fuse/fusedir.py#L671-L679

#2 Updated by Joshua Randall over 1 year ago

I've confirmed that adding `limit=1000` to the API list query (fusedir.py:673) results in a `by_tag` directory with 1000 entries. So, one aspect to the solution to this would be to implement a loop that retrieves all available tags (although in my opinion it would be better to do that within the python SDK a la #8502 as it is common pattern that is needed often).

<                 select=['name'], distinct=True
>                 select=['name'], distinct=True, limit=1000

I also checked differences in memory usage with a limit of 100 vs 1000 (because it looks like TagsDirectory creates a TagDirectory object for each tag and I wasn't sure how expensive those are).

# arv-mount /tmp/keep_root
# cat /proc/$(ps auxwww |grep keep_root | grep -v grep | awk '{print $2}')/status | grep '^VmRSS'
VmRSS:     31604 kB
# ls /tmp/keep_root/by_tag | wc -l
# cat /proc/$(ps auxwww |grep keep_root | grep -v grep | awk '{print $2}')/status | grep '^VmRSS'
VmRSS:     35156 kB
# umount /tmp/keep_root
# # change to limit=1000
# arv-mount /tmp/keep_root
# cat /proc/$(ps auxwww |grep keep_root | grep -v grep | awk '{print $2}')/status | grep '^VmRSS'
VmRSS:     31592 kB
# ls /tmp/keep_root/by_tag | wc -l
# cat /proc/$(ps auxwww |grep keep_root | grep -v grep | awk '{print $2}')/status | grep '^VmRSS'
VmRSS:     36984 kB

100 tag limit
35156 - 31604 kB = 3552 kB

1000 tag limit
36984 - 31592 kB = 5392 kB

Expected memory cost for each additional tag: (5392 - 3552 kB) / (1000-100) tags = 2.04 kB/tag

Given this, it seems reasonable to retain the current behavior of pre-populating TagDirectory objects unless there are more than in the vicinity of 100000-10000000 tags (at the 1m tag mark, the expected memory usage for arv-mount would be ~2GB, which seems excessive for a client that isn't likely to actually need 99.999% of those objects). I'm not understandingwhy the TagDirectory object is not simply created if and when a request is made for it. I think the only useful piece of state that it has is `self.tag` which is the name of the tag to use in the API query to get the associated collections. Since that is equal to the `name` that would be passed to the `lookup` function prior to access, I do not see any point in prepopulating those objects - it would be just as fast to have a general TagDirectory object that (like MagicDirectory) waits for the `lookup` call to do the actual API server query and prepare a list of associated CollectionDirectory objects (which also don't seem like they need to be prepopulated).

#3 Updated by Joshua Randall over 1 year ago

I would also note that it appears that the list of collections that a tag is associated with is also subject to the same issue regarding only doing a single API list query and thus it would also only show the first 100 collections associated with a particular tag.

#4 Updated by Joshua Randall over 1 year ago

  • Assignee set to Joshua Randall

I've implemented the basic fix for this (retrieving items across many batches using a generator function).

However, it makes the first `ls` of the `by_tag` directory rather slow - it is now taking ~50s to list the 31749 distinct tag names on the first access, then ~1s on a subsequent access. Unfortunately, because of the poll_time=60 default, those results only stay cached for 60s after which the next query goes back to taking a long time.

# arv-mount /tmp/keep_root
# time ls /tmp/keep_root/by_tag | wc -l

real    0m48.103s
user    0m0.292s
sys     0m0.015s
# sleep 30 && time ls /tmp/keep_root/by_tag | wc -l

real    0m0.735s
user    0m0.304s
sys     0m0.011s
# sleep 30 && time ls /tmp/keep_root/by_tag | wc -l

real    0m47.858s
user    0m0.324s
sys     0m0.018s

This wouldn't matter too much if I could just skip the `ls` and go straight to a tag subdirectory, but unfortunately the implementation makes that just as slow:

# arv-mount /tmp/keep_root
# time ls /tmp/keep_root/by_tag/lanelet\:16261_1#1

real    0m47.876s
user    0m0.000s
sys     0m0.002s

There is no good reason why an attempt to access a particular subdirectory should require first obtaining a full listing of the parent directory.

Also, FWIW, I'm starting to think that there is a fundamental problem with the API server limiting the items returned to a number as "low" as 1000. In this particular query, the payload size of all 31749 entries would be ~750kB, whereas some other individual items the API server returns (such as collections with large manifests) can be 20 times that size. Why make me do 32 queries (taking 1-2s each) to get my 750kB while allowing a single query to return 20x more data?

Now that the API server has some logic to return a sensible amount of data (rather than number of entries), perhaps the upper limit on batch size should be raised significantly (perhaps to 1m)?

#5 Updated by Joshua Randall over 1 year ago

IMHO clients shouldn't really be using polling to check if there is new data - ideally some mechanism (such as etag) would allow a client to ask the API server if the resource it is accessing has changed without having to actually perform a query at all.

#6 Updated by Tom Morris 4 months ago

  • Subject changed from arv-mount `by_tag` directory only shows 100 tags and tags that exist are not accessible to [FUSE] arv-mount `by_tag` directory only shows 100 tags and tags that exist are not accessible
  • Assignee changed from Joshua Randall to Peter Amstutz
  • Target version set to Arvados Future Sprints

Assigning to Peter for triage/grooming

#7 Updated by Tom Morris 13 days ago

  • Assignee deleted (Peter Amstutz)

#8 Updated by Tom Morris 5 days ago

  • Target version changed from Arvados Future Sprints to 2017-07-05 sprint

#9 Updated by Peter Amstutz 5 days ago

  • Assignee set to Peter Amstutz

#10 Updated by Peter Amstutz 4 days ago

Hey Josh, what do you think about making the tags directory "magic" similarly to the by_id directory, so the directory is only instantiated when you attempt to access it? This would remove the ability to use "ls" to list tags, but it sounds like that wouldn't be a problem for your use case. Or maybe we should fix the "tags" directory to have a full listing, but add an alternate "magic_tags" directory?

Also available in: Atom PDF