Bug #11960

[Websockets] trash events not delivered reliably

Added by Peter Amstutz 13 days ago. Updated 8 days ago.

Status:ResolvedStart date:07/12/2017
Priority:NormalDue date:
Assignee:Tom Clegg% Done:


Target version:2017-07-19 sprint
Story points-Remaining (hours)0.00 hour
Velocity based estimate-


What I did:

  • Subscribe to "create", "update" and "delete" events.
  • On workbench, go to a subproject under home
  • Delete a collection
  • Expected: some kind of event usefully indicating that it was trashed
  • Actually happens one of: no event or possibly a blank event {}, or an "update" event where new_attributes and old_attributes are the same (is_trashed isn't included in the properties).


Task #11967: Review 11960-trash-eventsResolvedLucas Di Pentima

Associated revisions

Revision eb66b34d
Added by Tom Clegg 10 days ago

Merge branch '11960-trash-events'

closes #11960

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>


#1 Updated by Peter Amstutz 13 days ago

  • Description updated (diff)

#2 Updated by Tom Clegg 13 days ago

Should add "is_trashed" to sendObjectAttributes in source:services/ws/session_v0.go

#3 Updated by Tom Clegg 13 days ago

I see this frame if I create and trash a collection in a different tab:

  "event_at": "2017-07-12T13:54:31.604427Z",
  "event_type": "update",
  "id": 44991169,
  "msgID": 29,
  "object_kind": "arvados#collection",
  "object_owner_uuid": "4xphq-j7d0g-j7vfmt308hc1a9l",
  "object_uuid": "4xphq-4zz18-q1r66no80m6owgo",
  "properties": {
    "new_attributes": {
      "name": "New collection (2017-07-12T13:53:48.867Z)",
      "owner_uuid": "4xphq-j7d0g-j7vfmt308hc1a9l",
      "portable_data_hash": "d41d8cd98f00b204e9800998ecf8427e+0" 
    "old_attributes": {
      "name": "New collection (2017-07-12T13:53:48.867Z)",
      "owner_uuid": "4xphq-j7d0g-j7vfmt308hc1a9l",
      "portable_data_hash": "d41d8cd98f00b204e9800998ecf8427e+0" 
  "uuid": "4xphq-57u5n-r8znuqy2qusqwu2" 
I think the reliability issue is a permission-checking bug.
  1. Collection gets trashed
  2. Websocket server sees the event and checks permission
  3. If permission is already cached because this client saw the "create" event (or some other event with the same object_uuid) then the event is sent
  4. If permission is not cached, the permission check fails because "get collection" returns 404 for a collection that has just been trashed.

For the collection-trashing case, the solution is to use the include_trash parameter when checking permission. For real "delete" events, that won't help. Checking permissions on object_owner_uuid instead of object_uuid will catch most cases. Clients still won't see "delete" events for objects in projects they can't see (i.e., where they have permission only by virtue of a permission link), though.

#4 Updated by Tom Clegg 12 days ago

#5 Updated by Tom Clegg 12 days ago

  • Category set to API
  • Status changed from New to In Progress
  • Assignee set to Tom Clegg
  • Target version set to 2017-07-19 sprint

#6 Updated by Lucas Di Pentima 12 days ago

  • Should this conditional delete event emission have a test?
  • Running services/ws tests many times produces 1 test to fail intermittently (around 50% of the times I tried), here's a piece of the log output:
    [ws] {"LogID":0,"error":"sql: no rows in result set","level":"error","msg":"QueryRow failed","time":"2017-07-12T14:41:56.737074348-04:00"}
    time="2017-07-12T14:41:56-04:00" level=warning msg="no database connection limit configured -- consider setting PostgresPool>0 in arvados-ws configuration file" 
    time="2017-07-12T14:41:56-04:00" level=error msg="QueryRow failed" LogID=0 error="sql: no rows in result set" 
    time="2017-07-12T14:41:58-04:00" level=error msg="lookup error" error="Get invalid URL port "discard"" token=3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi uuid=zzzzz-4zz18-uukreo9rbgwsujr 
    time="2017-07-12T14:41:58-04:00" level=error msg="lookup error" error="Get invalid URL port "discard"" token=3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi uuid=zzzzz-4zz18-foonbarfilesdir 

#7 Updated by Tom Clegg 11 days ago

I've added tests for the "trash collection" and "delete non-collection object" cases.

In doing so, I tripped on and fixed a few races that caused tests to fail sometimes. I ran tests 64 times in a row with this version and there were no failures.

Also updated the "send old events" ("last_log_id") implementation to fetch the list of ID numbers all at once instead of holding the query (and its connection) open until all events have been delivered. (I recently caught an arvados-ws with 17 open database connections, and the most recent query for most of them was the "select id from logs" query in sendOldEvents().)

11960-trash-events @ d3700e50bcaa9db177346199298913b54e37a2d5

#8 Updated by Lucas Di Pentima 10 days ago

This LGTM, please merge. Thanks!

#9 Updated by Tom Clegg 10 days ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:eb66b34d22cef6c98ad9ebdc228cf784b3136b72.

Also available in: Atom PDF