Bug #14113
closedc-d-s SqueueChecker may violate sync.RWMutex semantics (potential goroutine deadlock?)
Description
The SqueueChecker.All() function in crunch-dispatch-slurm appears to violate the semantics of RWMutex by calling RLock twice from the same goroutine.
From https://golang.org/pkg/sync/#RWMutex:
If a goroutine holds a RWMutex for reading and another goroutine might call Lock, no goroutine should expect to be able to acquire a read lock until the initial read lock is released. In particular, this prohibits recursive read locking. This is to ensure that the lock eventually becomes available; a blocked Lock call excludes new readers from acquiring the lock.
Both `SetPriority()` and `check()` can call `sqc.lock.Lock()` and `check()` is called from the goroutine started in `start()` which is called at the beginning of `All()`:
sqc.startOnce.Do(sqc.start) sqc.lock.RLock() defer sqc.lock.RUnlock() sqc.notify.Wait()
`All()` then proceeds to call `sqc.lock.RLock()` twice - once directly and once indirectly via `sqc.notify.Wait()` which has been configured to use the `sqc.lock.RLocker()` in `start()`:
sqc.notify.L = sqc.lock.RLocker()
Therefore, if I am reading the RWMutex docs correctly, it is possible that we run the risk of deadlocking a goroutine when `check()` calls `Lock` at an inopportune moment (this is not specified in the docs, but it may be if the order goes All-RLock, check-Lock, All(notify)-RLock).
We should probably not be using a single sqc.Lock to protect several different data structure accesses.
Updated by Joshua Randall over 5 years ago
N.B. exact same potential issue with `HasUUID`
Updated by Joshua Randall over 5 years ago
I think this may be ok as I had the order of unlock/lock backwards when considering how the sync.Cond.Wait() works.
Updated by Joshua Randall over 5 years ago
- Status changed from New to Closed
After review of the sync.Cond source, I am pretty sure this is ok.