Bug #18376

[keepstore] Avoid long-lived readdirent cookies in filesystem driver

Added by Tom Clegg 12 days ago. Updated 4 days ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
11/16/2021
Due date:
% Done:

50%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

We have seen the current implementation of IndexTo fail (error reading "/data/keep/13b": readdirent /data/keep/13b: errno 523) when the underlying filesystem is NFS and the indexing operation takes over 4 hours. (Errno 523 is EBADCOOKIE in NFS.)

We can avoid relying unnecessarily on long-lived readdirent cookies by
  • doing open/readdir/close on the top-level directory, then open/readdir/close on each subdirectory (the current implementation indexes each subdirectory before calling readdirent on the top-level directory to get the next subdir)
  • calling ReadDir() to get DirEnt structs as quickly as possible, then calling lstat() to get sizes (the current implementation uses Readdir(), which interleaves calls to lstat() and readdirent())

Subtasks

Task #18386: Review 18376-nfs-readdirentResolvedLucas Di Pentima

Task #18473: reviewNewLucas Di Pentima

History

#1 Updated by Tom Clegg 12 days ago

  • Description updated (diff)

#3 Updated by Lucas Di Pentima 12 days ago

This LGTM, thanks.

#4 Updated by Tom Clegg 12 days ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados-private:commit:arvados|153d9954cbe21a0e98bf5cf364898e2bc10fcabd.

#5 Updated by Ward Vandewege 11 days ago

  • Release set to 45

#6 Updated by Tom Clegg 10 days ago

  • Status changed from Resolved to In Progress

Problem persists. Maybe we need a retry loop to get through busy periods?

18376-nfs-readdirent @ f7278a4238a687ba4b8203417133bc9add5e166b -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2808/

#7 Updated by Peter Amstutz 6 days ago

  • Release changed from 45 to 48

#8 Updated by Tom Clegg 4 days ago

  • Target version changed from 2021-11-24 sprint to 2021-12-08 sprint

Also available in: Atom PDF