Bug #4759

[API] greater than filter is treated as greater than or equals for modification time.

Added by Misha Zatsman over 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
-
Target version:
Start date:
02/13/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

When I request collections with a modification time greater than x, I receive a collection with a modification time of x.

Example (to run from command line):

python -c "import arvados; import pprint;
pprint.pprint(arvados.api().collections().list(limit=1,order='modified_at',filters=[['modified_at','>','2013-07-08T19:37:57Z']]).execute())"

{u'etag': u'',
u'items': [{u'created_at': u'2013-07-08T19:37:57Z',
u'description': None,
u'etag': u'czqugdzh4kslkm8oq7yvzgb6f',
u'href': u'/collections/qr1hi-4zz18-hbwcw8tj4rfwf31',
u'kind': u'arvados#collection',
u'modified_at': u'2013-07-08T19:37:57Z',
u'modified_by_client_uuid': None,
u'modified_by_user_uuid': None,
u'name': None,
u'owner_uuid': u'qr1hi-tpzed-l7yhndlho2l8pi7',
u'portable_data_hash': u'5894dfae5d6d8edf135f0ea3dba849c2+62',
u'properties': None,
u'uuid': u'qr1hi-4zz18-hbwcw8tj4rfwf31'}],
u'items_available': 16053,
u'kind': u'arvados#collectionList',
u'limit': 1,
u'offset': 0,
u'self_link': u''}

One possible explanation is that the modification times are stored in a more precise format than we reveal to users, so this behavior is actually correct, because the modification time stored is a few milleseconds later than the reported modification time. In that case I'd argue that we should be revealing this higher precision timestamp to our users to let them do what they want.


Subtasks

Task #5213: Review branch: 4759-timestamp-precisionResolvedRadhika Chippada

Associated revisions

Revision 3598c300
Added by Tom Clegg about 5 years ago

Merge branch '4759-timestamp-precision-TC' closes #4759

Revision cbccf62a (diff)
Added by Brett Smith about 5 years ago

4759: Update Node Manager to parse new Arvados API timestamps.

Refs #4759.

History

#1 Updated by Tom Clegg over 5 years ago

  • Story points set to 0.5

#2 Updated by Tom Clegg over 5 years ago

  • Subject changed from greater than filter is treated as greater than or equals for modification time. to [API] greater than filter is treated as greater than or equals for modification time.

#3 Updated by Tom Clegg over 5 years ago

  • Target version set to Arvados Future Sprints

#4 Updated by Tom Clegg over 5 years ago

  • Target version changed from Arvados Future Sprints to 2015-01-28 Sprint

#5 Updated by Radhika Chippada over 5 years ago

  • Assigned To set to Radhika Chippada

#6 Updated by Tom Clegg about 5 years ago

  • Target version changed from 2015-01-28 Sprint to 2015-02-18 sprint

#7 Updated by Radhika Chippada about 5 years ago

  • We do have more precision in our tables than we are returning. Below is an example row from my db
    whose modified_at time is 2015-02-11 20:59:41.349813 and API returned value is 2015-02-11T20:59:41Z.
    Details below. Do we want to include milliseconds also to what we return in our API calls?
  • Row in my db at 2015-02-11 20:59:41.349813
arvados_dev=# select modified_at from collections where uuid='zzzzz-4zz18-0f5ltxrlt7ljkwr';
        modified_at         
----------------------------
 2015-02-11 20:59:41.349813
(1 row)
  • When I search with filters=[['modified_at','>','2015-02-11T20:59:41Z']], I get this collection and the
    timestamps returned are only up to the precision of seconds u'modified_at': u'2015-02-11T20:59:41Z'
python -c "import arvados; import pprint;  pprint.pprint(arvados.api().collections().list
(order='modified_at',filters=[['modified_at','>','2015-02-11T20:59:41Z']]).execute())" 

{u'etag': u'',
 u'items': [{u'created_at': u'2015-02-11T20:59:41Z',
             u'description': None,
             u'etag': u'2ca065kedhr7agmm2fg09o2l1',
             u'href': u'/collections/zzzzz-4zz18-0f5ltxrlt7ljkwr',
             u'kind': u'arvados#collection',
             u'modified_at': u'2015-02-11T20:59:41Z',
             u'modified_by_client_uuid': u'zzzzz-ozdt8-rhk40xi1j0ieseg',
             u'modified_by_user_uuid': u'zzzzz-tpzed-fosbryt7pvj8qqj',
             u'name': u'Collection created at 2015-02-11 15:59:41 -0500',
             u'owner_uuid': u'zzzzz-j7d0g-kk122lkquft16cn',
             u'portable_data_hash': u'4015d5536d0a89f45f0251dbaf84fcb5+141',
             u'properties': None,
             u'replication_desired': 2,
             u'uuid': u'zzzzz-4zz18-0f5ltxrlt7ljkwr'}],
 u'items_available': 1,
 u'kind': u'arvados#collectionList',
 u'limit': 100,
 u'offset': 0,
 u'self_link': u''}
  • If I search with filters=[['modified_at','>','2015-02-11T20:59:41.5Z']], I get 0 collections.
 
 python -c "import arvados; import pprint;  pprint.pprint(arvados.api().collections().list
(order='modified_at',filters=[['modified_at','>','2015-02-11T20:59:41.5Z']]).execute())" 

{u'etag': u'',
 u'items': [],
 u'items_available': 0,
 u'kind': u'arvados#collectionList',
 u'limit': 100,
 u'offset': 0,
 u'self_link': u''}

#8 Updated by Radhika Chippada about 5 years ago

  • Status changed from New to In Progress
  • Added a getter method for collection modified_at attribute with formatting to include milliseconds in api responses for collection objects.
    • I experimented with en.yml (i18n) and adding a time formats initializer. These did not help.
    • The getter method however helped in including ms in collection json returned.
  • I saw the milliseconds in api responses returned (using the above python example) as well as when workbench makes a collection request.
    • Observed no side effect in workbench. Workbench seems to wrap the json responses into model objects using discovery document and hence it only sees TimeWithZone (as before).
  • For now, I added the extended timestamp format for collection -> modified_at attributes only.
python -c "import arvados; import pprint;  pprint.pprint(arvados.api().collections().list
(order='modified_at',filters=[['modified_at','>','2015-02-13 15:18:41Z']]).execute())" 

{u'etag': u'',
 u'items': [{u'created_at': u'2015-02-13T15:18:41Z',
             u'description': None,
             u'etag': u'4bcxw5sjcfw2p1hugcvwexfyz',
             u'href': u'/collections/zzzzz-4zz18-3y0ohzd3mlwyjf3',
             u'kind': u'arvados#collection',
             u'modified_at': u'2015-02-13T15:18:41.108518000Z',
             u'modified_by_client_uuid': u'zzzzz-ozdt8-rhk40xi1j0ieseg',
             u'modified_by_user_uuid': u'zzzzz-tpzed-fosbryt7pvj8qqj',
             u'name': u'Collection created at 2015-02-13 10:18:40 -0500',
             u'owner_uuid': u'zzzzz-j7d0g-kk122lkquft16cn',
             u'portable_data_hash': u'e5ef2d7589ddd0c8405c6972c17b16fc+53',
             u'properties': None,
             u'replication_desired': 2,
             u'uuid': u'zzzzz-4zz18-3y0ohzd3mlwyjf3'}],
 u'items_available': 1,
 u'kind': u'arvados#collectionList',
 u'limit': 100,
 u'offset': 0,
 u'self_link': u''}

#9 Updated by Tom Clegg about 5 years ago

It doesn't seem quite right to make the models' attribute methods start returning strings instead of ActiveSupport::TimeWithZone. I'd like it better if we just changed the formatting without changing where/when times get converted to strings.

I think the ideal solution would change the default time formatter (and the one Oj uses) to our desired format. I haven't found the right hooks for that, but it might be possible (if repetitive) to tell acts_as_api which format we want to use in our API responses -- see 4b79e0e on 4759-timestamp-precision-TC. (It fails a lot of tests with ActiveModel::MissingAttributeError, though, I think because of the "select" stuff.)

How about something like that + the test case from 4759-timestamp-precision?

Suggestions for the test case:
  • Move to functional tests
  • Split "get" and "put" into separate tests (or just drop the "put" test -- it uses the same API response, right?)
  • In the "put" test, change something other than manifest_text, to avoid weighing down the test with manifest signing code
  • Likewise in "search": better to test using the cheapest/easiest possible path to the code being tested.
  • Add a trailing Z, and ^ and $ anchors, to the regexp
  • Write the regexp once as a constant and reuse, instead of copying & pasting
Unrelated aside, in existing code:
  • The second argument to search_using_filter search_filter, expected_items seems to be used as a sneaky boolean: if you pass it 3, false, or nil it asserts at least 1 result. Should probably treat it as a boolean where falsy means "zero results" and truthy means "non-zero results"...

#10 Updated by Tom Clegg about 5 years ago

Replaced 4759-timestamp-precision-TC with much nicer 5f2d5f9.

#11 Updated by Tom Clegg about 5 years ago

4759-timestamp-precision-TC now has

5517f02 4759: Add functional tests for timestamp precision.
6ee389b 4759: Add test for inequality filters.
4f8b2d7 4759: Ignore args to as_json.
5f2d5f9 4759: Use ISO 8601 timestamps with fractional seconds in API responses.

Behavior

I think 6ee389b is the best test of what we really want: the timestamps provided by the API server can be used in inequality filters with the expected results. If a list/get/whatever query returned a record R with "modified_at":"X", then:
  • Filter modified_at>="X" should include R;
  • Filter modified_at>"X" should not include R.

If the client truncates the precision (to 1 second, or 1 day, or whatever) we'll still interpret that as an earlier time rather than a range. I think this is fine. (Arguably it would be reasonable to interpret [["modified_at","=","2015-02-17"]] as [["modified_at",">=","2015-02-17T00:00:00.000000000Z"],["modified_at","<","2015-02-18T00:00:00.000000000Z"]] ...but that sounds more like a feature than a bugfix, the use case isn't obvious, and a client could accomplish the same effect easily enough (and more generally -- not just on zero boundaries) without our help, so it doesn't seem especially urgent.)

So the two examples in #4759#note-7 note-7 should (continue to) work as shown:
  • The first has a literal timestamp with no fractional second, which can be taken to mean ...:41.000000000, so a record with timestamp ...:41.349813 is in fact later.
  • The second asks for timestamps later than ...:41.5, which ...:41.349813 is not.

#12 Updated by Radhika Chippada about 5 years ago

Tom, yes, I agree that "2015-02-17" should mean "2015-02-17T00:00:00.000000000Z" and all the updates look good.

One minor comment about the tests. In the python test_timestamp_inequality_filter, why use specimen instead of any other object such as collection. We had talked about removing humans, specimens etc before and I think it would be nice to not use these object types in new tests.

#13 Updated by Tom Clegg about 5 years ago

Radhika Chippada wrote:

One minor comment about the tests. In the python test_timestamp_inequality_filter, why use specimen instead of any other object such as collection. We had talked about removing humans, specimens etc before and I think it would be nice to not use these object types in new tests.

I tend to use specimen to isolate tests from special behavior like Collection's manifest_text signatures and portable_data_hash: It's just a generic Arvados type with features like uuid and modified_at. (If we do decide to remove Specimen, will we replace it with some other type?)

#14 Updated by Radhika Chippada about 5 years ago

Tom, that is fine. I also used "humans" in several tests earlier for the same reason. We can deal with them when we delete these data models from our system.

LGTM

#15 Updated by Anonymous about 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:3598c3003a7987cca5c0536ba8206ec40c1c3649.

Also available in: Atom PDF