Project

General

Profile

Actions

Idea #20662

open

API to monitor container/requests

Added by Brett Smith 12 months ago. Updated 10 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
SDKs
Target version:
Start date:
Due date:
Story points:
-

Description

Basically everybody who uses Arvados ends up inventing a way to monitor container/requests, including ourselves in a-c-r. The Python SDK should provide an ergonomic way to do this.

Proposal: A class that lets you register interest in container/requests, automatically keeps tabs on those and their children, and calls dedicated methods when relevant container/requests finish. You can implement your own reaction logic by subclassing this classes and overriding the methods. API looks something like:

class WorkMonitor:
    def __init__(self, ...) -> None:
        "Doesn't start monitoring, just sets parameters for doing so.
        The parameters might vary by implementation. See discussion below." 
    def watch(self, uuid: str) -> None:
        "Registers interest in a container/request. Raises ValueError if the
        uuid doesn't represent one of those objects, or if the object is not
        found." 
    def unwatch(self, uuid: str) -> None:
        "Unregisters interest in a container/request. Raises ValueError if the
        uuid wasn't previously watched." 
    def start(self) -> None:
        "Starts monitoring in a separate thread." 
    def stop(self) -> None:
        "Stops monitoring and the associated thread. Note that monitoring
        should be stopped automatically if all watched container/requests reach
        an end state." 

    # Methods called when a watched container request changes state. Types are
    # the typeddicts from arvados.api_resources. container corresponds to the
    # request's container_uuid, if set.
    def request_uncommitted(self, request: ContainerRequest, container: Container | None) -> None:
        pass
    def request_committed(self, request: ContainerRequest, container: Container | None) -> None:
        pass
    def request_final(self, request: ContainerRequest, container: Container | None) -> None:
        pass
    def request_deleted(self, request: ContainerRequest, container: Container | None) -> None:
        pass

    # Methods called when a watched container changes state. requests is a list
    # of container requests that the user expressly watched, or children of
    # those requests, whose container_uuid points to this container. It may be
    # empty.
    def container_queued(self, container: Container, requests: list[ContainerRequest]) -> None:
        pass
    def container_locked(self, container: Container, requests: list[ContainerRequest]) -> None:
        pass
    def container_running(self, container: Container, requests: list[ContainerRequest]) -> None:
        pass
    def container_cancelled(self, container: Container, requests: list[ContainerRequest]) -> None:
        pass
    def container_complete(self, container: Container, requests: list[ContainerRequest]) -> None:
        pass
    def container_deleted(self, container: Container, requests: list[ContainerRequest]) -> None:
        pass

Details to hash out

  • The implementation could be based on Websockets and arvados.events, or it could just poll the API. Websockets should generally be more efficient and responsive. The main thing to consider there is whether we're willing to commit to the Websockets server being user-facing.
    • If we decide this can/should be based on Websockets, there are some filters that the Websockets server could add support for that could make this more efficient (especially object_kind), but they're not strictly required.
    • If we decide to poll the API, it should probably ensure that no API requests are automatically retried, and instead just wait for the next poll interval if an API request fails.
  • I think the constructor should accept a list of attributes to hold onto. Those are the only attributes that should be saved internally, and the only attributes in dicts that are passed to the transition methods. The default set should probably be a limited set targeted for state monitoring: uuid, name, state, container_uuid, exit_code, log, and output. The implementation should ensure that attributes it requires for its own operation are included in the set.
    • If the implementation is API-based, appropriate attributes should be passed to select.
  • It's intentional that the state change methods have empty implementations, rather than raising NotImplementedError. This lets subclasses implement only the state changes they care about and ignore the others.
Actions #1

Updated by Brett Smith 12 months ago

  • Description updated (diff)
Actions #2

Updated by Brett Smith 12 months ago

  • Description updated (diff)
Actions #4

Updated by Brett Smith 10 months ago

The main thing to consider there is whether we're willing to commit to the Websockets server being user-facing.

We have at least one user who has built their own notification system on top of the websockets server, so this ship is at least leaving port if it hasn't already sailed.

Actions

Also available in: Atom PDF