Project

General

Profile

Idea #9328

Updated by Tom Clegg almost 8 years ago

See [[Container dispatch]] for background. 

 Containers can go from "Locked" to "Queued" state if something fails before the container started actually running, or if the dispatcher loses track of the container and believes it is no longer running. 

 A Queued container can then be re-locked by the same dispatcher, which would initiate a second crunch-run process, which could race the first crunch-run process and lead to confusing results as both crunch-run processes have the ability to update the container record. 

 The proposed solution is to introduce an additional API token which will be issued when the container is Locked.    This will be the API token that the crunch-run process will use to update the record.    If the container is unlocked for any reason, the API token will be revoked, and as a result the crunch-run process will be unable to modify the container record and fail; the new crunch-run process will be able to take ownership of the container record safely with a new API token. 

 Proposed implementation: 

 # Add an @run_auth_uuid@ field to "containers" table on the API server 
 # When the container is Locked, set @run_auth_uuid@ to a system user with a read/write scope of just the container record 
 # When the dispatcher queues or executes the container using @crunch-run@, set ARVADOS_API_TOKEN to the @run_auth_uuid@ token 
 # If the container returns to "Queued" the @run_auth_uuid@ token is revoked/deleted and the field is cleared. 

 Rationale: 

 If crunch-run is started multiple times, the old crunch-run will be unable to update the container record because its token is revoked.    Only the new crunch-run with the new ARVADOS_API_TOKEN will be able to update the container record.

Back