Project

General

Profile

Actions

Bug #13500

closed

crunch-dispatch-slurm PG::TRDeadlockDetected: ERROR: deadlock detected

Added by Joshua Randall over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

We have crunch-dispatch-slurm 1.1.4.20180511205416-1

root@arvados-master-eglyx:~# dpkg -l |grep crunch-dis
ii  crunch-dispatch-slurm            1.1.4.20180511205416-1                     amd64        Dispatch Crunch containers to a SLURM cluster

The logs are getting messages like this about 10 times per minute:

May 17 15:43:14 arvados-master-eglyx crunch-dispatch-slurm[32079]: DETAIL:  Process 41291 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 46646.
May 17 15:43:14 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 46646 waits for ExclusiveLock on relation 16453 of database 16385; blocked by process 41291.
May 17 15:43:14 arvados-master-eglyx crunch-dispatch-slurm[32079]: HINT:  See server log for query details.
May 17 15:43:14 arvados-master-eglyx crunch-dispatch-slurm[32079]: > (422: 422 Unprocessable Entity) returned by arvados-api-eglyx.hgi.sanger.ac.uk
May 17 15:43:14 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:14 Done monitoring container eglyx-dz642-e6zsrf34kxvm0un
May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:19 Error updating container eglyx-dz642-ed9d98nc35cy676 to state "Cancelled": arvados API server error: #<PG::TRDeadlockDetected: ERROR:  deadlock detected
May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: DETAIL:  Process 41982 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 46644.
May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 46644 waits for ExclusiveLock on relation 16453 of database 16385; blocked by process 43414.
May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 43414 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 41982.
May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: HINT:  See server log for query details.
May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: > (422: 422 Unprocessable Entity) returned by arvados-api-eglyx.hgi.sanger.ac.uk
May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:19 Done monitoring container eglyx-dz642-ed9d98nc35cy676
May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:23 Error updating container eglyx-dz642-su55175dq46ce5n to state "Cancelled": arvados API server error: #<PG::TRDeadlockDetected: ERROR:  deadlock detected
May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: DETAIL:  Process 46743 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 42203.
May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 42203 waits for ExclusiveLock on relation 16453 of database 16385; blocked by process 42197.
May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 42197 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 46743.
May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: HINT:  See server log for query details.
May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: > (422: 422 Unprocessable Entity) returned by arvados-api-eglyx.hgi.sanger.ac.uk
May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:23 Done monitoring container eglyx-dz642-su55175dq46ce5n
May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:34 Error updating container eglyx-dz642-h8w2hpksftit3fx to state "Cancelled": arvados API server error: #<PG::TRDeadlockDetected: ERROR:  deadlock detected
May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: DETAIL:  Process 43623 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 41291.
May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 41291 waits for ExclusiveLock on relation 16453 of database 16385; blocked by process 29904.
May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 29904 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 43623.
May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: HINT:  See server log for query details.
May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: > (422: 422 Unprocessable Entity) returned by arvados-api-eglyx.hgi.sanger.ac.uk
May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:34 Done monitoring container eglyx-dz642-h8w2hpksftit3fx
May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:34 Error updating container eglyx-dz642-mm5lolu4as26bog to state "Cancelled": arvados API server error: #<PG::TRDeadlockDetected: ERROR:  deadlock detected

This does not seem to be having an adverse effect on performance, but I just wanted to report it as it seems to be new as of the latest version of crunch-dispatch-slurm.


Related issues

Related to Arvados - Bug #13164: [API] dispatch sometimes tries to run cancelled containers at the expense of pending containersResolvedTom Clegg05/14/2018Actions
Related to Arvados - Bug #13594: PG::TRDeadlockDetected when running cwl tests in parallelResolvedTom CleggActions
Actions

Also available in: Atom PDF