Project

General

Profile

Feature #14325

Updated by Tom Clegg over 5 years ago

This issue covers the smallest version that can be deployed on a dev cluster. 

 Requirements: 
 * One cloud vendor driver (Azure = #14324) 
 * Bring up nodes and run containers on them 
 * Ops mechanism for draining a node (e.g., curl command using a management token) 
 * HTTP status report with current set of containers (queued/running) and VMs (busy/idle) -- see [[Dispatching containers to cloud VMs#Operator view]] "Operator view" 
 * Structured logs for diagnostics+statistics: cloud API errors, node lifecycle, container lifecycle 
 * Resource consumption metrics (instances running/allocated, hourly cost) 
 * Shutdown idle nodes automatically 
 * Handle cloud API quota/ratelimit errors 
 * Cancel containers that can't be scheduled 

 Non-requirements: 
 * Multiple cloud drivers 
 * Test suite that uses a real cloud provider 
 * Performance metrics Metrics 
 * Periodic status reports in logs 
 * Optimize worker VM deployment (for now, we still expect the operator to provide an image with a suitable version of crunch-run) 
 * Configurable spending limits 

 Refs 
 * [[Dispatching containers to cloud VMs]] 
 * #13964 spike 

Back