PySDK includes basic workflow reporting script
The Python SDK should include a script that can run on the command line to generate a basic CSV report about run workflows. The first goal of this script is to serve as documentation by example: the code should be written in such a way that it's easy for people to read, learn from, and adapt to their own reporting needs. (This is why it needs to be Python and not Go.) It should demonstrate best practices for Arvados API clients, like using
select effectively to limit API overhead. But it should be runnable by itself too as a basic demonstration. It should accept a few arguments:
--owner: The UUID to pass to
groups/contentsto find workflows to report. Defaults to the current user's UUID.
--no-recursive: Flags to set the value of the
groups/contents. Default true.
--finished-at: All of these limit reporting to containers with
date_field >= argument. The argument can be an ISO 8601 date/time, or a sleep-style duration like "3days" which is subtracted from the current datetime to generate the filter timestamp.
--name: If specified, limits reporting to containers with
name LIKE argument. If the argument contains no SQL metacharacters, the script should probably coerce the argument to
%argument%, maybe with a notice.
Output to stdout a CSV with the following fields:
You should be able to open this report in Calc/Excel/Sheets and generate basic reporting about runtime, cost, etc.
This script should be available if you install the Python SDK through pip, but I'm ambivalent about whether we actually install a runnable script, or just have it available somewhere you can run it with
Updated by Brett Smith 3 days ago
If we do #21017 first it would also be easy to report on a single workflow and all its child containers. Basically the
--owner argument would be renamed (just make it a mandatory argument?) and we could report this mode when the UUID identifies a container or request.