Project

General

Profile

Actions

Bug #21314

closed

a-d-c should cancel a container if it can't be loaded

Added by Tom Clegg 11 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Story points:
-
Release:
Release relationship:
Auto

Description

If a container's "mounts" field is invalid, a-d-c logs this, and keeps trying.

{"ClusterID":"irdev","ContainerUUID":"xxxxx-dz642-xxxxxxxxxxxxxxx","PID":2037423,"error":"json: cannot unmarshal array into Go struct field Container.mounts of type arvados.Mount","level":"warning","msg":"error getting mounts","time":"2023-12-13T20:34:41.064140517Z"}

In this situation, the offending container should be cancelled.

It should put a message in the "error" runtime status to indicate something is wrong.


Subtasks 1 (0 open1 closed)

Task #22167: Review 21314-cancel-invalid-mountsResolvedTom Clegg10/03/2024Actions

Related issues

Related to Arvados - Bug #21302: Container created with corrupted mountsIn ProgressBrett SmithActions
Actions #1

Updated by Tom Clegg 11 months ago

  • Description updated (diff)
Actions #2

Updated by Tom Clegg 11 months ago

  • Related to Bug #21302: Container created with corrupted mounts added
Actions #3

Updated by Peter Amstutz 9 months ago

  • Target version changed from To be scheduled to Future
Actions #4

Updated by Peter Amstutz about 2 months ago

  • Target version changed from Future to Development 2024-10-09 sprint
Actions #5

Updated by Peter Amstutz about 2 months ago

  • Description updated (diff)
Actions #6

Updated by Tom Clegg about 2 months ago

  • Status changed from New to In Progress
Actions #7

Updated by Tom Clegg about 2 months ago

21314-cancel-invalid-mounts @ dc8508c26abe5db7758bff013ac163ed6c219b39 -- developer-run-tests: #4479

  • All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
    • runtime_status["error"] will look like: "error getting mounts from container record: json: cannot unmarshal array into Go struct field Container.mounts of type arvados.Mount"
    • the new "cancel" behavior is triggered by the substring "json: cannot unmarshal" appearing in the error message -- since this is essentially working around an internal system problem, I figure we should be fairly conservative
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • the underlying bug is #21302 already linked
  • Code is tested and passing, both automated and manual, what manual testing was done is described.
    • ✅ test case mangles the database content directly, and confirms the error message
  • New or changed UX/UX and has gotten feedback from stakeholders.
    • N/A
  • Documentation has been updated.
    • N/A
  • Behaves appropriately at the intended scale (describe intended scale).
    • N/A
  • Considered backwards and forwards compatibility issues between client and server.
    • N/A
  • Follows our coding standards and GUI style guidelines.
Actions #8

Updated by Lucas Di Pentima about 2 months ago

Code LGTM.

Just a thought: wouldn't this unmarshaling error also happen on the controller side? Or is it blindly forwarding what it gets from rails or the database?

Actions #9

Updated by Tom Clegg about 2 months ago

Just a thought: wouldn't this unmarshaling error also happen on the controller side? Or is it blindly forwarding what it gets from rails or the database?

Yes, currently controller is still just propagating the literal response from railsapi, not diverting to the "new code path" that decodes and rewrites (see source:lib/controller/handler.go). When we migrate that, this test will fail and (if we haven't fixed the underlying problem) we'll have to do something on the controller side as well.

Actions #10

Updated by Tom Clegg about 2 months ago

  • Status changed from In Progress to Resolved
Actions #11

Updated by Peter Amstutz about 2 months ago

  • Release set to 70
Actions

Also available in: Atom PDF