Bug #12990

[FUSE] Access shared/ is inefficient

Added by Peter Amstutz 11 months ago. Updated 10 months ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

The shared/ directory of FUSE has several issues:

  1. no update lock, may start overlapping updates in separate threads
  2. no incremental lookup of individual names, always loads full list, bad for scaling
  3. fetches full record, which may include description or properties payload which is not used by wastes bandwith

Related issues

Related to Arvados - Story #13146: [API] Endpoint to get projects shared with meResolved2018-08-15

Associated revisions

Revision ae35f42f
Added by Peter Amstutz 11 months ago

Merge branch '12990-fuse-shared' refs #12990

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Peter Amstutz 11 months ago

  • Status changed from New to In Progress

#2 Updated by Peter Amstutz 11 months ago

  • Description updated (diff)

#3 Updated by Peter Amstutz 11 months ago

12990-fuse-shared @ 0dcf9daff8fce376f20f125c3ef867333976c18c

Addresses points (1) and (3) but not incremental lookup (this turns out to be hard due to the way the contents of shared/ is determined).

#4 Updated by Tom Clegg 11 months ago

LGTM

This looks like it should fix the "flood the apiserver with many threads of groups#list requests" issue we're seeing.

I'm not certain, but I see a couple of other issues that (if they're real) are probably worth fixing:
  • ProjectDirectory and SharedDirectory don't seem to call fresh() after updating, like CollectionDirectory does. Does this mean once they go stale, they stay stale forever, and every lookup triggers a refresh?
  • If N threads decide that self is stale, they all line up for updating_lock, and do their updates serially. But the first one should (according to the previous point, at least) set fresh(), which means the next N-1 threads will dutifully do their laborious updates even though self is already fresher than they could possibly have wanted it to be back when they decided to update. Perhaps it would be better to do one of
    • Check stale() after acquiring _updating_lock, so the last N-1 threads just wait for the update that's already in progress to finish, and don't bother doing their own.
    • Use acquire(false) to do a non-blocking lock. This is a bit different in that it knowingly returns stale results, but in the case of SharedDirectory maybe this kind of race is OK, since we generally only detect staleness using a race-prone timer anyway?

#5 Updated by Tom Clegg 11 months ago

Tom Clegg wrote:

I'm not certain, but I see a couple of other issues that (if they're real) are probably worth fixing:

(from irc) Not real. merge() sets fresh flag. "Check stale() after acquiring" already happens.

#7 Updated by Tom Morris 11 months ago

  • Assigned To set to Peter Amstutz

#8 Updated by Peter Amstutz 11 months ago

To do this more efficient likely requires a new API endpoint. The way arv-mount currently determines what to list in "shared" currently requires looking at all projects and finding the ones where owner_uuid is not another project which is visible to us (meaning: users, non-project groups, or shared subprojects where the parent is not visible). This is expensive to compute on the client, but can probably be accomplished with a single query on the API server.

#9 Updated by Peter Amstutz 11 months ago

  • Target version changed from 2018-01-31 Sprint to Arvados Future Sprints

#10 Updated by Peter Amstutz 10 months ago

  • Related to Story #13146: [API] Endpoint to get projects shared with me added

#11 Updated by Peter Amstutz 10 months ago

Discussion about API endpoint moved to #13146

Also available in: Atom PDF