Bug #10705: [Crunch2] [API] return a more specific 422 error message when a client calls containers#unlock without having the lock - Arvados

Actions

Copy link

Bug #10705

open

[Crunch2] [API] return a more specific 422 error message when a client calls containers#unlock without having the lock

Added by Peter Amstutz over 8 years ago. Updated about 1 year ago.

Status:

New

Priority:

Normal

Assigned To:

Category:

Target version:

Future

Story points:

Release:

Postponed

Release relationship:

Auto

Description

Currently we get "arvados API server error: #<ArvadosModel::InvalidStateTransitionError: ArvadosModel::InvalidStateTransitionError> (422: 422 Unprocessable Entity) returned by 9tee4.arvadosapi.com"

This should be something more suggestive of locking, perhaps something along the lines of YouDoNotHaveTheLockError.

Subtasks 1 (1 open — 0 closed)

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Tom Morris over 8 years ago

Target version set to Arvados Future Sprints

Actions

Copy link

Updated by Peter Amstutz over 8 years ago

Target version changed from Arvados Future Sprints to 2017-01-04 sprint

Actions

Copy link

Updated by Peter Amstutz over 8 years ago

Assigned To set to Peter Amstutz

Actions

Copy link

Updated by Peter Amstutz over 8 years ago

Target version changed from 2017-01-04 sprint to 2017-01-18 sprint

Actions

Copy link

Updated by Radhika Chippada over 8 years ago

Assigned To changed from Peter Amstutz to Radhika Chippada

Actions

Copy link

Updated by Radhika Chippada over 8 years ago

@ 703b7800

Make container_request -> unlock method a no-op so that crunch-dispatcher doesn't keep retrying when an unlock request errors out.

Actions

Copy link

Updated by Tom Clegg over 8 years ago

I don't think this is solving the right problem.

Even if unlocking an already-unlocked container is a no-op, crunch-dispatch-slurm still needs to correctly handle the case where unlock fails -- e.g., a different dispatcher has locked the container, or the container has been cancelled.

Once crunch-dispatch-slurm is fixed, "can't unlock an already-unlocked container" should be rare and harmless when things are working well, and helpful for turning making subtle bugs (like this one) easier to find.

So, instead of changing the error to a no-op, can we fix whatever prevented crunch-dispatch-slurm from noticing that the container has been taken away from it, and it should stop trying to mess with it?

Actions

Copy link

Updated by Tom Clegg over 8 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Peter Amstutz over 8 years ago

Tom Clegg wrote:

I don't think this is solving the right problem.

Even if unlocking an already-unlocked container is a no-op, crunch-dispatch-slurm still needs to correctly handle the case where unlock fails -- e.g., a different dispatcher has locked the container, or the container has been cancelled.

Once crunch-dispatch-slurm is fixed, "can't unlock an already-unlocked container" should be rare and harmless when things are working well, and helpful for turning making subtle bugs (like this one) easier to find.

So, instead of changing the error to a no-op, can we fix whatever prevented crunch-dispatch-slurm from noticing that the container has been taken away from it, and it should stop trying to mess with it?

There is a whole series of bugs linked from #10700. I agree that the actual fix is to ensure that crunch-dispatch-slurm doesn't misbehave when the unlock operation fails. On the other hand, if the API server didn't give a misleading response, maybe the existing code would be sufficient to recover. So to reduce the chance of future problems we should fix it on both sides.

Actions

Copy link

#10

Updated by Tom Clegg over 8 years ago

Peter Amstutz wrote:

if the API server didn't give a misleading response, maybe the existing code would be sufficient to recover. So to reduce the chance of future problems we should fix it on both sides.

Hm. This reasoning sounds like "if the API call had succeeded, maybe we wouldn't need to handle the error case". But we do need to handle the error case to handle the slightly-different scenarios, like state==Cancelled and locked-by-someone-else -- and when we handle the error case, there is no need to change the API...?

I'm much more comfortable keeping the behavior as "either unlock, or return an error". If a client calls "unlock" when the container is not even locked, that's an opportunity for the client to notice that something is not working as expected. If double-unlock returns OK there is no way for the client to know anything is amiss. In other words, it's already easy for a client to ignore a harmless error, but it would be impossible for a client to detect a suppressed error.

Actions

Copy link

#11

Updated by Radhika Chippada about 8 years ago

Target version changed from 2017-01-18 sprint to 2017-02-01 sprint

Actions

Copy link

#12

Updated by Tom Clegg about 8 years ago

Subject changed from [Crunch2] API server returns 422 when providing container state field with same value as current state to [Crunch2] [API] return a more specific 422 error message when a client calls containers#unlock without having the lock
Description updated (diff)

Actions

Copy link

#13