Bug #11960

[Websockets] trash events not delivered reliably

Added by Peter Amstutz 4 months ago. Updated 4 months ago.

Status:ResolvedStart date:07/12/2017
Priority:NormalDue date:
Assignee:Tom Clegg% Done:

100%

Category:API
Target version:2017-07-19 sprint
Story points-Remaining (hours)0.00 hour
Velocity based estimate-

Description

What I did:

  • Subscribe to "create", "update" and "delete" events.
  • On workbench, go to a subproject under home
  • Delete a collection
  • Expected: some kind of event usefully indicating that it was trashed
  • Actually happens one of: no event or possibly a blank event {}, or an "update" event where new_attributes and old_attributes are the same (is_trashed isn't included in the properties).

Subtasks

Task #11967: Review 11960-trash-eventsResolvedLucas Di Pentima

Associated revisions

Revision eb66b34d
Added by Tom Clegg 4 months ago

Merge branch '11960-trash-events'

closes #11960

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Peter Amstutz 4 months ago

  • Description updated (diff)

#2 Updated by Tom Clegg 4 months ago

Should add "is_trashed" to sendObjectAttributes in source:services/ws/session_v0.go

#3 Updated by Tom Clegg 4 months ago

I see this frame if I create and trash a collection in a different tab:

{
  "event_at": "2017-07-12T13:54:31.604427Z",
  "event_type": "update",
  "id": 44991169,
  "msgID": 29,
  "object_kind": "arvados#collection",
  "object_owner_uuid": "4xphq-j7d0g-j7vfmt308hc1a9l",
  "object_uuid": "4xphq-4zz18-q1r66no80m6owgo",
  "properties": {
    "new_attributes": {
      "name": "New collection (2017-07-12T13:53:48.867Z)",
      "owner_uuid": "4xphq-j7d0g-j7vfmt308hc1a9l",
      "portable_data_hash": "d41d8cd98f00b204e9800998ecf8427e+0" 
    },
    "old_attributes": {
      "name": "New collection (2017-07-12T13:53:48.867Z)",
      "owner_uuid": "4xphq-j7d0g-j7vfmt308hc1a9l",
      "portable_data_hash": "d41d8cd98f00b204e9800998ecf8427e+0" 
    }
  },
  "uuid": "4xphq-57u5n-r8znuqy2qusqwu2" 
}
I think the reliability issue is a permission-checking bug.
  1. Collection gets trashed
  2. Websocket server sees the event and checks permission
  3. If permission is already cached because this client saw the "create" event (or some other event with the same object_uuid) then the event is sent
  4. If permission is not cached, the permission check fails because "get collection" returns 404 for a collection that has just been trashed.

For the collection-trashing case, the solution is to use the include_trash parameter when checking permission. For real "delete" events, that won't help. Checking permissions on object_owner_uuid instead of object_uuid will catch most cases. Clients still won't see "delete" events for objects in projects they can't see (i.e., where they have permission only by virtue of a permission link), though.

#4 Updated by Tom Clegg 4 months ago

#5 Updated by Tom Clegg 4 months ago

  • Category set to API
  • Status changed from New to In Progress
  • Assignee set to Tom Clegg
  • Target version set to 2017-07-19 sprint

#6 Updated by Lucas Di Pentima 4 months ago

  • Should this conditional delete event emission have a test?
  • Running services/ws tests many times produces 1 test to fail intermittently (around 50% of the times I tried), here's a piece of the log output:
    [ws] {"LogID":0,"error":"sql: no rows in result set","level":"error","msg":"QueryRow failed","time":"2017-07-12T14:41:56.737074348-04:00"}
    time="2017-07-12T14:41:56-04:00" level=warning msg="no database connection limit configured -- consider setting PostgresPool>0 in arvados-ws configuration file" 
    time="2017-07-12T14:41:56-04:00" level=error msg="QueryRow failed" LogID=0 error="sql: no rows in result set" 
    time="2017-07-12T14:41:58-04:00" level=error msg="lookup error" error="Get https://127.0.0.1:discard/arvados/v1/collections/zzzzz-4zz18-uukreo9rbgwsujr?include_trash=true&select=%5B%22uuid%22%5D: invalid URL port "discard"" token=3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi uuid=zzzzz-4zz18-uukreo9rbgwsujr 
    time="2017-07-12T14:41:58-04:00" level=error msg="lookup error" error="Get https://127.0.0.1:discard/arvados/v1/collections/zzzzz-4zz18-foonbarfilesdir?include_trash=true&select=%5B%22uuid%22%5D: invalid URL port "discard"" token=3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi uuid=zzzzz-4zz18-foonbarfilesdir 
    [...]
    

#7 Updated by Tom Clegg 4 months ago

I've added tests for the "trash collection" and "delete non-collection object" cases.

In doing so, I tripped on and fixed a few races that caused tests to fail sometimes. I ran tests 64 times in a row with this version and there were no failures.

Also updated the "send old events" ("last_log_id") implementation to fetch the list of ID numbers all at once instead of holding the query (and its connection) open until all events have been delivered. (I recently caught an arvados-ws with 17 open database connections, and the most recent query for most of them was the "select id from logs" query in sendOldEvents().)

11960-trash-events @ d3700e50bcaa9db177346199298913b54e37a2d5

#8 Updated by Lucas Di Pentima 4 months ago

This LGTM, please merge. Thanks!

#9 Updated by Tom Clegg 4 months ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:eb66b34d22cef6c98ad9ebdc228cf784b3136b72.

Also available in: Atom PDF