Project

General

Profile

Feature #11146

Updated by Tom Clegg over 7 years ago

h3. Background 

 From the user's perspective, it's hard to see what (if anything) is happening between the time a container is created/queued and the time it actually starts running. 

 In a SLURM setup, the container typically moves quickly from Queued to Locked state when crunch-dispatch-slurm puts it in the slurm queue, and then stays there for some time waiting for SLURM resources to run it. 

 h3. Proposed feature 

 Soon after a container is submitted to the SLURM queue, Workbench should start indicating how close the resulting SLURM job is to the front of the queue. 

 h3. Implementation 

 When checking squeue, crunch-dispatch-slurm should notice the slurm queue position for each "Locked" container, and propagate this information to the API server. 
 * API: Add a new serialized Hash field @dispatch_info@ 
 * crunch-dispatch-slurm: store queue position as @dispatch_info["queue_position"]@ 
 * crunch-dispatch-slurm: only update containers for which this process has the lock 
 * crunch-dispatch-slurm: rate-limit queue position updates for any given container: max one update per second, avoid sending redundant updates like "update queue position from 5 to 5" 
 * crunch-dispatch-slurm: ensure no races between "update queue position" and "update container state" requests 
 * Workbench: with a @queue_position@ key 

 Workbench should display the latest queue position when available available. 

Back