Idea #20662
openPython API to monitor container/requests
Description
Basically everybody who uses Arvados ends up inventing a way to monitor container/requests, including ourselves in a-c-r. The Python SDK should provide an ergonomic way to do this.
Proposal: A class that lets you register interest in container/requests, automatically keeps tabs on those and their children, and calls dedicated methods when relevant container/requests finish. You can implement your own reaction logic by subclassing this classes and overriding the methods. API looks something like:
class WorkMonitor:
def __init__(self, ...) -> None:
"Doesn't start monitoring, just sets parameters for doing so.
The parameters might vary by implementation. See discussion below."
def watch(self, uuid: str) -> None:
"Registers interest in a container/request. Raises ValueError if the
uuid doesn't represent one of those objects, or if the object is not
found."
def unwatch(self, uuid: str) -> None:
"Unregisters interest in a container/request. Raises ValueError if the
uuid wasn't previously watched."
def start(self) -> None:
"Starts monitoring in a separate thread."
def stop(self) -> None:
"Stops monitoring and the associated thread. Note that monitoring
should be stopped automatically if all watched container/requests reach
an end state."
# Methods called when a watched container request changes state. Types are
# the typeddicts from arvados.api_resources. container corresponds to the
# request's container_uuid, if set.
def request_uncommitted(self, request: ContainerRequest, container: Container | None) -> None:
pass
def request_committed(self, request: ContainerRequest, container: Container | None) -> None:
pass
def request_final(self, request: ContainerRequest, container: Container | None) -> None:
pass
def request_deleted(self, request: ContainerRequest, container: Container | None) -> None:
pass
# Methods called when a watched container changes state. requests is a list
# of container requests that the user expressly watched, or children of
# those requests, whose container_uuid points to this container. It may be
# empty.
def container_queued(self, container: Container, requests: list[ContainerRequest]) -> None:
pass
def container_locked(self, container: Container, requests: list[ContainerRequest]) -> None:
pass
def container_running(self, container: Container, requests: list[ContainerRequest]) -> None:
pass
def container_cancelled(self, container: Container, requests: list[ContainerRequest]) -> None:
pass
def container_complete(self, container: Container, requests: list[ContainerRequest]) -> None:
pass
def container_deleted(self, container: Container, requests: list[ContainerRequest]) -> None:
pass
Details to hash out
- The implementation could be based on Websockets and
arvados.events
, or it could just poll the API. Websockets should generally be more efficient and responsive. The main thing to consider there is whether we're willing to commit to the Websockets server being user-facing.- If we decide this can/should be based on Websockets, there are some filters that the Websockets server could add support for that could make this more efficient (especially
object_kind
), but they're not strictly required. - If we decide to poll the API, it should probably ensure that no API requests are automatically retried, and instead just wait for the next poll interval if an API request fails.
- If we decide this can/should be based on Websockets, there are some filters that the Websockets server could add support for that could make this more efficient (especially
- I think the constructor should accept a list of attributes to hold onto. Those are the only attributes that should be saved internally, and the only attributes in dicts that are passed to the transition methods. The default set should probably be a limited set targeted for state monitoring:
uuid
,name
,state
,container_uuid
,exit_code
,log
, andoutput
. The implementation should ensure that attributes it requires for its own operation are included in the set.- If the implementation is API-based, appropriate attributes should be passed to
select
.
- If the implementation is API-based, appropriate attributes should be passed to
- It's intentional that the state change methods have empty implementations, rather than raising
NotImplementedError
. This lets subclasses implement only the state changes they care about and ignore the others.
Updated by Brett Smith about 1 year ago
The main thing to consider there is whether we're willing to commit to the Websockets server being user-facing.
We have at least one user who has built their own notification system on top of the websockets server, so this ship is at least leaving port if it hasn't already sailed.
Updated by Peter Amstutz about 2 months ago
- Target version changed from Future to Development 2024-09-25 sprint
Updated by Peter Amstutz about 1 month ago
- Target version changed from Development 2024-09-25 sprint to Development 2024-10-09 sprint
Updated by Peter Amstutz 18 days ago
- Subject changed from API to monitor container/requests to Python API to monitor container/requests
Updated by Peter Amstutz 18 days ago
- Target version changed from Development 2024-10-09 sprint to Development 2024-10-23 sprint
Updated by Peter Amstutz 11 days ago
- Target version changed from Development 2024-10-23 sprint to Development 2024-11-06 sprint