Bug #18376
closed[keepstore] Avoid long-lived readdirent cookies in filesystem driver
Description
We have seen the current implementation of IndexTo fail (error reading "/data/keep/13b": readdirent /data/keep/13b: errno 523
) when the underlying filesystem is NFS and the indexing operation takes over 4 hours. (Errno 523 is EBADCOOKIE in NFS.)
- doing open/readdir/close on the top-level directory, then open/readdir/close on each subdirectory (the current implementation indexes each subdirectory before calling readdirent on the top-level directory to get the next subdir)
- calling ReadDir() to get DirEnt structs as quickly as possible, then calling lstat() to get sizes (the current implementation uses Readdir(), which interleaves calls to lstat() and readdirent())
Related issues
Updated by Tom Clegg about 3 years ago
18376-nfs-readdirent @ 6e0b8fe3e7a9ee4834dc454d6f0c5a409590ce6d -- developer-run-tests: #2798
Updated by Tom Clegg about 3 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados-private:commit:arvados|153d9954cbe21a0e98bf5cf364898e2bc10fcabd.
Updated by Tom Clegg about 3 years ago
- Status changed from Resolved to In Progress
Problem persists. Maybe we need a retry loop to get through busy periods?
18376-nfs-readdirent @ f7278a4238a687ba4b8203417133bc9add5e166b -- developer-run-tests: #2808
Updated by Tom Clegg almost 3 years ago
- Target version changed from 2021-11-24 sprint to 2021-12-08 sprint
Updated by Tom Clegg almost 3 years ago
Likelihood of hitting this error appears to vary with load, so we might stop seeing it when #18547 is fixed. In the cluster in question, multiple keepstore processes on different nodes get directory indexes on the same NFS volume all at once.
Updated by Tom Clegg almost 3 years ago
- Related to Bug #18547: [keep-balance] Avoid redundant indexing when multiple keepstore servers use a single NFS mount added
Updated by Lucas Di Pentima almost 3 years ago
Retry loop at f7278a4 LGTM. Thanks.
Updated by Peter Amstutz almost 3 years ago
- Blocks Idea #18518: Release Arvados 2.3.2 added