Project

General

Profile

Actions

Feature #18071

closed

Use postgresql advisory locks to prevent concurrent dispatcher / keep-balance processes

Added by Tom Clegg over 1 year ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
10/31/2022
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-


Subtasks 1 (0 open1 closed)

Task #19507: Review 18071-dblock-keep-balance-and-dispatchResolvedTom Clegg10/31/2022

Actions
Actions #1

Updated by Tom Clegg over 1 year ago

  • Release deleted (20)
Actions #2

Updated by Tom Clegg over 1 year ago

  • Description updated (diff)
Actions #3

Updated by Tom Clegg 3 months ago

  • Target version set to 2022-09-28 sprint
  • Assigned To set to Tom Clegg
Actions #4

Updated by Peter Amstutz 3 months ago

This should be the kind of lock that allows the new process can elbow out the old process -- I'm thinking of the situation where we start a new "something" and want it to replace the old "something".

So we want to communicate:

  • To the new process that it is now allowed to take over
  • To the old process it should release the lock and shut down
Actions #5

Updated by Peter Amstutz 3 months ago

On second thought, that might be a bad idea, because it could lead to two processes fighting over the lock instead of one getting it and the other failing.

The one that fails to get the lock, perhaps it could stay up health check reports it in a "failed to get lock" state? Also, can the lock record some information about the node that does have the lock.

Actions #6

Updated by Peter Amstutz 3 months ago

When lock is acquired, record hostname + process id

Actions #7

Updated by Peter Amstutz 2 months ago

  • Target version changed from 2022-09-28 sprint to 2022-10-12 sprint
Actions #8

Updated by Peter Amstutz 2 months ago

  • Target version changed from 2022-10-12 sprint to 2022-10-26 sprint
Actions #9

Updated by Peter Amstutz about 2 months ago

  • Category set to API
Actions #10

Updated by Peter Amstutz about 2 months ago

  • Target version changed from 2022-10-26 sprint to 2022-11-09 sprint
Actions #11

Updated by Tom Clegg about 1 month ago

  • Status changed from New to In Progress
Actions #12

Updated by Tom Clegg about 1 month ago

18071-dblock-keep-balance-and-dispatch @ 15043a6825ecd62ccb2272025384474a235b30cc -- developer-run-tests: #3346

  • only one keep-balance service (sweep, sleep, repeat) runs at a time
  • only one keep-balance sweep runs at a time (if you run "keep-balance -once" in a terminal while a keep-balance server process is already running, they will take turns nicely)
  • only one dispatcher (crunch-dispatch-slurm, arvados-dispatch-lsf, arvados-dispatch-cloud) runs at a time
  • when waiting for a lock, logs indicate the host/port of the database connection of the client that currently has the lock
Actions #13

Updated by Lucas Di Pentima about 1 month ago

Just one small suggestion:

  • File services/keep-balance/balance.go lines 73-75: I think the Run() usage comment is superfluous and also outdated so we can simply remove it?

The rest LGTM

Actions #14

Updated by Tom Clegg about 1 month ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF