Story #6473

[SDKs] arv-ws supports an option to show events starting from a given time

Added by Brett Smith over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
SDKs
Target version:
Start date:
07/08/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

arv-ws can live tail the logs for a given pipeline or job, but it won't go back in time. If you want to see logs already written for a running job, you have to use a command like:

arv log list -f '[["object_uuid", "=", "UUID"], ["event_type", "=", "stderr"]]' | jq -r '.items | map(.properties.text) | add'

This isn't very nice. The Python SDK Websockets client has support to start fetching events from a particular point in time. Add an option to arv-ws to start getting events from a specified datetime.


Subtasks

Task #6536: Review branch: 6473-fetch-events-starting-atResolvedRadhika Chippada

Associated revisions

Revision dd18605e
Added by Radhika Chippada over 6 years ago

closes #6473
Merge branch '6473-fetch-events-starting-at'

History

#1 Updated by Brett Smith over 6 years ago

  • Target version changed from Arvados Future Sprints to 2015-07-22 sprint

#2 Updated by Brett Smith over 6 years ago

  • Story points set to 0.5

#3 Updated by Radhika Chippada over 6 years ago

  • Assigned To set to Radhika Chippada

#4 Updated by Radhika Chippada over 6 years ago

  • Status changed from New to In Progress

#5 Updated by Radhika Chippada over 6 years ago

What changed:

  • Added additional start_time argument to arv-ws. Expected format is listed as "YYYY-MM-DD hh:mm:ss". Should it be listed as "%Y-%m-%d %H:%M:%S" instead?
  • Added test-arv-ws.py with the desire to test the arguments, at a minimum.
    • One test with an unknown argument "-x" fails as expected.
    • However, a test with valid arguments to the run_ws method results in subscribe and then starts to waits and blocks ('-s=2015-07-15'). Hence, I omitted such a test for the time being. Any suggestions as to how to create such a test?

#6 Updated by Tom Clegg over 6 years ago

I think the code, the flag description, and the feature we wanted are three different things.

Flag description: "Arvados query filter to apply to log events created after this time."
  • This sounds to me like "give me events matching filters A,B,C that happen before/at time T; plus events matching filters A,B,C,D,E that happen after time T."
Implemented at 22f68d3:
  • Convenience flag, shorthand for --filters '[["created_at",">","...."]]'
My understanding of the feature we want:
  • Give me events matching A,B,C that happen after time T -- regardless of whether they were logged before or after my websocket connection is established. In other words, give me (right away) all past log events (between T and now) matching filters A,B,C; and then feed me future events matching filters A,B,C as they happen.

Apparently (at least) one of us is misreading the story. Can we clarify?

#7 Updated by Brett Smith over 6 years ago

Tom Clegg wrote:

My understanding of the feature we want:
  • Give me events matching A,B,C that happen after time T -- regardless of whether they were logged before or after my websocket connection is established. In other words, give me (right away) all past log events (between T and now) matching filters A,B,C; and then feed me future events matching filters A,B,C as they happen.

This description matches the desired functionality, yes.

My comment about the SDK currently supporting this may be… uhhhh… wrong. I think I thought that's what PollClient's poll_time argument was for, but that's not right, and besides EventClient doesn't have a corresponding argument. One way to bridge the gap may be to write a small function that finds the first log with/after a given timestamp. EventClient could use that to seed last_log_id, and PollClient could poll from there.

But that's just another idea off the top of my head. The point is, the functional requirements are as you described.

#8 Updated by Radhika Chippada over 6 years ago

Branch 6473-fetch-events-starting-at at 1c6515a4445e9fd79961c9c8caf21528f16d9399

- The subscribe method now offers a "last_log_id" parameter

- Tests are updated to create an object before invoking the subscribe method and verify that that old object is also fetched when listening for objects in the past.

#9 Updated by Peter Amstutz over 6 years ago

- Changes to EventClient look good.

- You need to implement the same last_log_id feature in PollClient. Please extend the tests so they test both EventClient and PollClient.

- The test "start time today" seems like it would fail if it runs right when the clock ticks over midnight?

#10 Updated by Radhika Chippada over 6 years ago

17e8c59deca04137f66ece7837f96599ff82f4de

  • Enhanced PollClient to support last_log_id and added additional tests.
  • Updated the said test.

Thanks.

#11 Updated by Peter Amstutz over 6 years ago

I got an error:

$ arv-ws
2015-07-21 11:31:00 arvados.arv-ws[16745] ERROR: 'Namespace' object has no attribute 'last_log_id'

It is because of this code:

    if args.start_time:
        args.last_log_id = 1
        filters += [ ['created_at', '>=', args.start_time] ]

The problem is that args.last_log_id isn't declared on the "args" object. Either it shouldn't be attached to args (just use last_log_id as a local variable) or you should add another option to the parser:

parser.add_argument('--last-log-id', type=int, help="Log event counter to start at.")

Also

 parser.add_argument('-s', '--start_time', ...)

The '--start_time' should be '--start-time' to follow convention. To access the value in Python it gets transformed by argparse into args.start_time.

#12 Updated by Radhika Chippada over 6 years ago

  • last_log_id : corrected the issue and now using last_log_id
  • fixed the input argument handling

Thanks.

#13 Updated by Peter Amstutz over 6 years ago

Different error this time:

$ arv-ws
2015-07-21 13:35:14 arvados.arv-ws[2151] ERROR: local variable 'last_log_id' referenced before assignment

#14 Updated by Radhika Chippada over 6 years ago

Oops. Updated (added an else block to set last_log_id to None as needed). Also tested manually with and without -s. Thanks.

#15 Updated by Radhika Chippada over 6 years ago

Expanded start_time input argument description to say the time is server time in UTC, per Peter's suggestion.

#16 Updated by Radhika Chippada over 6 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:dd18605e7fe7c537dc84a3e3174d18ec0320c95a.

Also available in: Atom PDF