Project

General

Profile

Actions

Bug #11960

closed

[Websockets] trash events not delivered reliably

Added by Peter Amstutz over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
-

Description

What I did:

  • Subscribe to "create", "update" and "delete" events.
  • On workbench, go to a subproject under home
  • Delete a collection
  • Expected: some kind of event usefully indicating that it was trashed
  • Actually happens one of: no event or possibly a blank event {}, or an "update" event where new_attributes and old_attributes are the same (is_trashed isn't included in the properties).

Subtasks 1 (0 open1 closed)

Task #11967: Review 11960-trash-eventsResolvedLucas Di Pentima07/12/2017Actions
Actions #1

Updated by Peter Amstutz over 7 years ago

  • Description updated (diff)
Actions #2

Updated by Tom Clegg over 7 years ago

Should add "is_trashed" to sendObjectAttributes in source:services/ws/session_v0.go

Actions #3

Updated by Tom Clegg over 7 years ago

I see this frame if I create and trash a collection in a different tab:

{
  "event_at": "2017-07-12T13:54:31.604427Z",
  "event_type": "update",
  "id": 44991169,
  "msgID": 29,
  "object_kind": "arvados#collection",
  "object_owner_uuid": "4xphq-j7d0g-j7vfmt308hc1a9l",
  "object_uuid": "4xphq-4zz18-q1r66no80m6owgo",
  "properties": {
    "new_attributes": {
      "name": "New collection (2017-07-12T13:53:48.867Z)",
      "owner_uuid": "4xphq-j7d0g-j7vfmt308hc1a9l",
      "portable_data_hash": "d41d8cd98f00b204e9800998ecf8427e+0" 
    },
    "old_attributes": {
      "name": "New collection (2017-07-12T13:53:48.867Z)",
      "owner_uuid": "4xphq-j7d0g-j7vfmt308hc1a9l",
      "portable_data_hash": "d41d8cd98f00b204e9800998ecf8427e+0" 
    }
  },
  "uuid": "4xphq-57u5n-r8znuqy2qusqwu2" 
}
I think the reliability issue is a permission-checking bug.
  1. Collection gets trashed
  2. Websocket server sees the event and checks permission
  3. If permission is already cached because this client saw the "create" event (or some other event with the same object_uuid) then the event is sent
  4. If permission is not cached, the permission check fails because "get collection" returns 404 for a collection that has just been trashed.

For the collection-trashing case, the solution is to use the include_trash parameter when checking permission. For real "delete" events, that won't help. Checking permissions on object_owner_uuid instead of object_uuid will catch most cases. Clients still won't see "delete" events for objects in projects they can't see (i.e., where they have permission only by virtue of a permission link), though.

Actions #4

Updated by Tom Clegg over 7 years ago

Actions #5

Updated by Tom Clegg over 7 years ago

  • Category set to API
  • Status changed from New to In Progress
  • Assigned To set to Tom Clegg
  • Target version set to 2017-07-19 sprint
Actions #6

Updated by Lucas Di Pentima over 7 years ago

  • Should this conditional delete event emission have a test?
  • Running services/ws tests many times produces 1 test to fail intermittently (around 50% of the times I tried), here's a piece of the log output:
    [ws] {"LogID":0,"error":"sql: no rows in result set","level":"error","msg":"QueryRow failed","time":"2017-07-12T14:41:56.737074348-04:00"}
    time="2017-07-12T14:41:56-04:00" level=warning msg="no database connection limit configured -- consider setting PostgresPool>0 in arvados-ws configuration file" 
    time="2017-07-12T14:41:56-04:00" level=error msg="QueryRow failed" LogID=0 error="sql: no rows in result set" 
    time="2017-07-12T14:41:58-04:00" level=error msg="lookup error" error="Get https://127.0.0.1:discard/arvados/v1/collections/zzzzz-4zz18-uukreo9rbgwsujr?include_trash=true&select=%5B%22uuid%22%5D: invalid URL port "discard"" token=3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi uuid=zzzzz-4zz18-uukreo9rbgwsujr 
    time="2017-07-12T14:41:58-04:00" level=error msg="lookup error" error="Get https://127.0.0.1:discard/arvados/v1/collections/zzzzz-4zz18-foonbarfilesdir?include_trash=true&select=%5B%22uuid%22%5D: invalid URL port "discard"" token=3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi uuid=zzzzz-4zz18-foonbarfilesdir 
    [...]
    
Actions #7

Updated by Tom Clegg over 7 years ago

I've added tests for the "trash collection" and "delete non-collection object" cases.

In doing so, I tripped on and fixed a few races that caused tests to fail sometimes. I ran tests 64 times in a row with this version and there were no failures.

Also updated the "send old events" ("last_log_id") implementation to fetch the list of ID numbers all at once instead of holding the query (and its connection) open until all events have been delivered. (I recently caught an arvados-ws with 17 open database connections, and the most recent query for most of them was the "select id from logs" query in sendOldEvents().)

11960-trash-events @ d3700e50bcaa9db177346199298913b54e37a2d5

Actions #8

Updated by Lucas Di Pentima over 7 years ago

This LGTM, please merge. Thanks!

Actions #9

Updated by Tom Clegg over 7 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:eb66b34d22cef6c98ad9ebdc228cf784b3136b72.

Actions

Also available in: Atom PDF