Idea #21024
openPySDK includes basic workflow reporting script
Description
The Python SDK should include a script that can run on the command line to generate a basic CSV report about run workflows. The first goal of this script is to serve as documentation by example: the code should be written in such a way that it's easy for people to read, learn from, and adapt to their own reporting needs. (This is why it needs to be Python and not Go.) It should demonstrate best practices for Arvados API clients, like using filters
and select
effectively to limit API overhead. But it should be runnable by itself too as a basic demonstration. It should accept a few arguments:
--owner
: The UUID to pass togroups/contents
to find workflows to report. Defaults to the current user's UUID.--recursive
/--no-recursive
: Flags to set the value of therecursive
argument togroups/contents
. Default true.--created-at
,--modified-at
,--started-at
,--finished-at
: All of these limit reporting to containers withdate_field >= argument
. The argument can be an ISO 8601 date/time, or a sleep-style duration like "3days" which is subtracted from the current datetime to generate the filter timestamp.--name
: If specified, limits reporting to containers withname LIKE argument
. If the argument contains no SQL metacharacters, the script should probably coerce the argument to%argument%
, maybe with a notice.
Output to stdout a CSV with the following fields:
- uuid
- owner_uuid
- state
- exit_code
- started_at
- finished_at
- cost
- subrequests_cost
- output
- log
You should be able to open this report in Calc/Excel/Sheets and generate basic reporting about runtime, cost, etc.
This script should be available if you install the Python SDK through pip, but I'm ambivalent about whether we actually install a runnable script, or just have it available somewhere you can run it with python3
.
Related issues