[crunch-run] [collectionfs] Deadlock while writing output collection
Running several similar containers, some succeeded but some got stuck while writing the final output collection. The log indicates all of the expected files were written to collectionfs ("copying ... (... bytes)") but the finished collection was never saved.
#1 Updated by Tom Clegg about 1 month ago
Found a case where commitBlock would set a "flushing in progress" flag on a segment, but then return early (and never indicate
it was done) because a different segment had another flush in progress. Once that happened, sync operations like MarshalManifest would block forever.
15946-collectionfs-deadlock @ c05caa378debd04205690c6cb96508e4e7fb6c8b -- https://ci.arvados.org/view/Developer/job/developer-run-tests/1700/