Project

General

Profile

Story #20662

Updated by Brett Smith 8 months ago

Basically everybody who uses Arvados ends up inventing a way to monitor container/requests, including ourselves in a-c-r. The Python SDK should provide an ergonomic way to do this. 

 *Proposal:* A class that lets you register interest in container/requests, automatically keeps tabs on those and their children, and calls dedicated methods when relevant container/requests finish. You can implement your own reaction logic by subclassing this classes and overriding the methods. API looks something like: 

 <pre><code class="python"> <pre class="py"> 
 class WorkMonitor: 
     def __init__(self, ...) -> None: 
         "Doesn't start monitoring, just sets parameters for doing so. 
         The parameters might vary by implementation. See discussion below." 
     def watch(self, uuid: str) -> None: 
         "Registers interest in a container/request. Raises ValueError if the 
         uuid doesn't represent one of those objects, or if the object is not 
         found." 
     def unwatch(self, uuid: str) -> None: 
         "Unregisters interest in a container/request. Raises ValueError if the 
         uuid wasn't previously watched." 
     def start(self) -> None: 
         "Starts monitoring in a separate thread." 
     def stop(self) -> None: 
         "Stops monitoring and the associated thread. Note that monitoring 
         should be stopped automatically if all watched container/requests reach 
         an end state." 

     # Methods called when a watched container request changes state. Types are 
     # the typeddicts from arvados.api_resources. container corresponds to the 
     # request's container_uuid, if set. 
     def request_uncommitted(self, request: ContainerRequest, container: Container | None) -> None: 
         pass 
     def request_committed(self, request: ContainerRequest, container: Container | None) -> None: 
         pass 
     def request_final(self, request: ContainerRequest, container: Container | None) -> None: 
         pass 
     def request_deleted(self, request: ContainerRequest, container: Container | None) -> None: 
         pass 

     # Methods called when a watched container changes state. requests is a list 
     # of container requests that the user expressly watched, or children of 
     # those requests, whose container_uuid points to this container. It may be 
     # empty. 
     def container_queued(self, container: Container, requests: request: list[ContainerRequest]) -> None: 
         pass 
     def container_locked(self, container: Container, requests: request: list[ContainerRequest]) -> None: 
         pass 
     def container_running(self, container: Container, requests: request: list[ContainerRequest]) -> None: 
         pass 
     def container_cancelled(self, container: Container, requests: request: list[ContainerRequest]) -> None: 
         pass 
     def container_complete(self, container: Container, requests: request: list[ContainerRequest]) -> None: 
         pass 
     def container_deleted(self, container: Container, requests: list[ContainerRequest]) -> None: 
         pass 
 </code></pre> </pre> 

 *Details to hash out* 

 * The implementation _could_ be based on Websockets and @arvados.events@, or it could just poll the API. Websockets should generally be more efficient and responsive. The main thing to consider there is whether we're willing to commit to the Websockets server being user-facing. 
 ** If we decide this can/should be based on Websockets, there are some filters that the Websockets server could add support for that could make this more efficient (especially @object_kind@), but they're not strictly required. 
 ** If we decide to poll the API, it should probably ensure that no API requests are automatically retried, and instead just wait for the next poll interval if an API request fails. 
 * I think the constructor should accept a list of attributes to hold onto. Those are the only attributes that should be saved internally, and the only attributes in dicts that are passed to the transition methods. The default set should probably be a limited set targeted for state monitoring: @uuid@, @name@, @state@, @container_uuid@, @exit_code@, @log@, and @output@. The implementation should ensure that attributes it requires for its own operation are included in the set. 
 ** If the implementation is API-based, appropriate attributes should be passed to @select@. 
 * It's intentional that the state change methods have empty implementations, rather than raising @NotImplementedError@. This lets subclasses implement only the state changes they care about and ignore the others.

Back