Bug #18376
closed
[keepstore] Avoid long-lived readdirent cookies in filesystem driver
Added by Tom Clegg over 3 years ago.
Updated over 3 years ago.
Release relationship:
Auto
Description
We have seen the current implementation of IndexTo fail (error reading "/data/keep/13b": readdirent /data/keep/13b: errno 523
) when the underlying filesystem is NFS and the indexing operation takes over 4 hours. (Errno 523 is EBADCOOKIE in NFS.)
We can avoid relying unnecessarily on long-lived readdirent cookies by
- doing open/readdir/close on the top-level directory, then open/readdir/close on each subdirectory (the current implementation indexes each subdirectory before calling readdirent on the top-level directory to get the next subdir)
- calling ReadDir() to get DirEnt structs as quickly as possible, then calling lstat() to get sizes (the current implementation uses Readdir(), which interleaves calls to lstat() and readdirent())
- Description updated (diff)
- Status changed from In Progress to Resolved
Applied in changeset arvados-private:commit:arvados|153d9954cbe21a0e98bf5cf364898e2bc10fcabd.
- Status changed from Resolved to In Progress
- Release changed from 45 to 48
- Target version changed from 2021-11-24 sprint to 2021-12-08 sprint
Likelihood of hitting this error appears to vary with load, so we might stop seeing it when #18547 is fixed. In the cluster in question, multiple keepstore processes on different nodes get directory indexes on the same NFS volume all at once.
- Related to Bug #18547: [keep-balance] Avoid redundant indexing when multiple keepstore servers use a single NFS mount added
Retry loop at f7278a4 LGTM. Thanks.
- Status changed from In Progress to Resolved
Also available in: Atom
PDF