Project

General

Profile

Actions

Bug #10705

open

[Crunch2] [API] return a more specific 422 error message when a client calls containers#unlock without having the lock

Added by Peter Amstutz over 7 years ago. Updated 27 days ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

Currently we get "arvados API server error: #<ArvadosModel::InvalidStateTransitionError: ArvadosModel::InvalidStateTransitionError> (422: 422 Unprocessable Entity) returned by 9tee4.arvadosapi.com"

This should be something more suggestive of locking, perhaps something along the lines of YouDoNotHaveTheLockError.


Subtasks 1 (1 open0 closed)

Task #10740: Review 10705-state-transition-errorIn ProgressRadhika Chippada01/11/2017Actions

Related issues

Related to Arvados - Bug #10700: [Crunch2] crunch-dispatch-slurm pileupResolvedTom Clegg01/27/2017Actions
Actions #1

Updated by Tom Morris over 7 years ago

  • Target version set to Arvados Future Sprints
Actions #2

Updated by Peter Amstutz over 7 years ago

  • Target version changed from Arvados Future Sprints to 2017-01-04 sprint
Actions #3

Updated by Peter Amstutz over 7 years ago

  • Assigned To set to Peter Amstutz
Actions #4

Updated by Peter Amstutz about 7 years ago

  • Target version changed from 2017-01-04 sprint to 2017-01-18 sprint
Actions #5

Updated by Radhika Chippada about 7 years ago

  • Assigned To changed from Peter Amstutz to Radhika Chippada
Actions #6

Updated by Radhika Chippada about 7 years ago

@ 703b7800

Make container_request -> unlock method a no-op so that crunch-dispatcher doesn't keep retrying when an unlock request errors out.

Actions #7

Updated by Tom Clegg about 7 years ago

I don't think this is solving the right problem.

Even if unlocking an already-unlocked container is a no-op, crunch-dispatch-slurm still needs to correctly handle the case where unlock fails -- e.g., a different dispatcher has locked the container, or the container has been cancelled.

Once crunch-dispatch-slurm is fixed, "can't unlock an already-unlocked container" should be rare and harmless when things are working well, and helpful for turning making subtle bugs (like this one) easier to find.

So, instead of changing the error to a no-op, can we fix whatever prevented crunch-dispatch-slurm from noticing that the container has been taken away from it, and it should stop trying to mess with it?

Actions #8

Updated by Tom Clegg about 7 years ago

  • Status changed from New to In Progress
Actions #9

Updated by Peter Amstutz about 7 years ago

Tom Clegg wrote:

I don't think this is solving the right problem.

Even if unlocking an already-unlocked container is a no-op, crunch-dispatch-slurm still needs to correctly handle the case where unlock fails -- e.g., a different dispatcher has locked the container, or the container has been cancelled.

Once crunch-dispatch-slurm is fixed, "can't unlock an already-unlocked container" should be rare and harmless when things are working well, and helpful for turning making subtle bugs (like this one) easier to find.

So, instead of changing the error to a no-op, can we fix whatever prevented crunch-dispatch-slurm from noticing that the container has been taken away from it, and it should stop trying to mess with it?

There is a whole series of bugs linked from #10700. I agree that the actual fix is to ensure that crunch-dispatch-slurm doesn't misbehave when the unlock operation fails. On the other hand, if the API server didn't give a misleading response, maybe the existing code would be sufficient to recover. So to reduce the chance of future problems we should fix it on both sides.

Actions #10

Updated by Tom Clegg about 7 years ago

Peter Amstutz wrote:

if the API server didn't give a misleading response, maybe the existing code would be sufficient to recover. So to reduce the chance of future problems we should fix it on both sides.

Hm. This reasoning sounds like "if the API call had succeeded, maybe we wouldn't need to handle the error case". But we do need to handle the error case to handle the slightly-different scenarios, like state==Cancelled and locked-by-someone-else -- and when we handle the error case, there is no need to change the API...?

I'm much more comfortable keeping the behavior as "either unlock, or return an error". If a client calls "unlock" when the container is not even locked, that's an opportunity for the client to notice that something is not working as expected. If double-unlock returns OK there is no way for the client to know anything is amiss. In other words, it's already easy for a client to ignore a harmless error, but it would be impossible for a client to detect a suppressed error.

Actions #11

Updated by Radhika Chippada about 7 years ago

  • Target version changed from 2017-01-18 sprint to 2017-02-01 sprint
Actions #12

Updated by Tom Clegg about 7 years ago

  • Subject changed from [Crunch2] API server returns 422 when providing container state field with same value as current state to [Crunch2] [API] return a more specific 422 error message when a client calls containers#unlock without having the lock
  • Description updated (diff)
Actions #13

Updated by Radhika Chippada about 7 years ago

  • Target version changed from 2017-02-01 sprint to Arvados Future Sprints
Actions #14

Updated by Tom Morris over 6 years ago

  • Status changed from In Progress to New
  • Assigned To deleted (Radhika Chippada)
Actions #15

Updated by Tom Morris almost 6 years ago

  • Release deleted (11)
Actions #16

Updated by Ward Vandewege over 2 years ago

  • Target version deleted (Arvados Future Sprints)
Actions #17

Updated by Peter Amstutz about 1 year ago

  • Release set to 60
Actions #18

Updated by Peter Amstutz 27 days ago

  • Target version set to Future
Actions

Also available in: Atom PDF