Project

General

Profile

Actions

Bug #14113

closed

c-d-s SqueueChecker may violate sync.RWMutex semantics (potential goroutine deadlock?)

Added by Joshua Randall over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
-
Story points:
-

Description

The SqueueChecker.All() function in crunch-dispatch-slurm appears to violate the semantics of RWMutex by calling RLock twice from the same goroutine.

From https://golang.org/pkg/sync/#RWMutex:

If a goroutine holds a RWMutex for reading and another goroutine might call Lock, no goroutine should expect to be able to acquire a read lock until the initial read lock is released. In particular, this prohibits recursive read locking. This is to ensure that the lock eventually becomes available; a blocked Lock call excludes new readers from acquiring the lock.

Both `SetPriority()` and `check()` can call `sqc.lock.Lock()` and `check()` is called from the goroutine started in `start()` which is called at the beginning of `All()`:

        sqc.startOnce.Do(sqc.start)
        sqc.lock.RLock()
        defer sqc.lock.RUnlock()
        sqc.notify.Wait()

`All()` then proceeds to call `sqc.lock.RLock()` twice - once directly and once indirectly via `sqc.notify.Wait()` which has been configured to use the `sqc.lock.RLocker()` in `start()`:

        sqc.notify.L = sqc.lock.RLocker()

Therefore, if I am reading the RWMutex docs correctly, it is possible that we run the risk of deadlocking a goroutine when `check()` calls `Lock` at an inopportune moment (this is not specified in the docs, but it may be if the order goes All-RLock, check-Lock, All(notify)-RLock).

We should probably not be using a single sqc.Lock to protect several different data structure accesses.

Actions #1

Updated by Joshua Randall over 5 years ago

N.B. exact same potential issue with `HasUUID`

Actions #2

Updated by Joshua Randall over 5 years ago

I think this may be ok as I had the order of unlock/lock backwards when considering how the sync.Cond.Wait() works.

Actions #3

Updated by Joshua Randall over 5 years ago

  • Status changed from New to Closed

After review of the sync.Cond source, I am pretty sure this is ok.

Actions

Also available in: Atom PDF