Large number of collections ties up all connections?
User container is getting FUSE errors, the error message in arv-mount.txt is "Failed to connect to 172.17.0.1 port 36323: Connection refused"
In addition crunch-run.txt is reporting "error updating log collection: error recording logs: Could not write sufficient replicas ... dial tcp 172.17.0.1:36323 conne" (presumably connection refused but the message is truncated)
This is with a local compute node keepstore. The keepstore service had to be working initially because it was able to load the docker image and write the initial log collection snapshot. Subsequently it has not been able to update the log collection with the error above.
This suggests the keepstore service crashed. startLocalKeepstore uses the health check to determine when the service has started, but does not set up an ongoing watchdog to ensure the service continues to be available.
Is it possible some kind of connectivity issue could cause keepstore to quit?
Also, keepstore didn't log anything, which is mysterious, I seem to recall an issue a few months ago with the logging level being too quiet by default?
Ran the workflow again and looked into it.
There is a very large number of collections, several 1000
arv-mount is reporting "can't start new thread"
And then later on, it starts reporting "Connection refused"
I think this is what is happening:
- Each collection gets its own instance of BlockManager
- Each BlockManager has its own pool of "put" threads and "get" threads (prefetch)
- If the maximum threads is ~4096 and there are > 2000 collections, it will eventually run out of threads
- Meanwhile, all those threads are making connections to keepstore
- They should be using the same keep client but ???
- If all the "user agents" are tied up, it'll allocate a new one
- If all those threads and all those connections have HTTP keepalive, they eventually use up the ~4096 connections that keepstore can have by default, resulting in "connection refused" errors.
The keepclient needs to be shared (it should already but double check)
The "get" and "put" thread pools should be shared (new behavior, maybe the thread pool moves to the Keep client).
Need to identify if there are any resource leaks related to lingering connections, e.g. when FUSE evicts a collection from the cache, it should make sure the block manager is shut down.
Updated by Peter Amstutz 6 months ago
20637-prefetch-threads @ 4e4c935d6fddb68997a50a382bff01c223dd00df
This avoids the problem of every Collection with its own BlockManager
creating its own prefetch thread pool, which becomes a resource leak
when reading files from 1000s of separate Collection objects.
The 'put' thread pool remains with the BlockManager but it now stops
the put threads on 'BlockManager.commit_all'. This is because this
method always flushes pending blocks anyway, and is called before the
collection record is written to the API server -- so we can assume
we've just finished a batch of writes to that collection, and might
not need the put thread pool any more, and if we do, we can just make
a new one.