Bug #21314
closeda-d-c should cancel a container if it can't be loaded
Description
If a container's "mounts" field is invalid, a-d-c logs this, and keeps trying.
{"ClusterID":"irdev","ContainerUUID":"xxxxx-dz642-xxxxxxxxxxxxxxx","PID":2037423,"error":"json: cannot unmarshal array into Go struct field Container.mounts of type arvados.Mount","level":"warning","msg":"error getting mounts","time":"2023-12-13T20:34:41.064140517Z"}
In this situation, the offending container should be cancelled.
It should put a message in the "error" runtime status to indicate something is wrong.
Related issues
Updated by Tom Clegg 11 months ago
- Related to Bug #21302: Container created with corrupted mounts added
Updated by Peter Amstutz 9 months ago
- Target version changed from To be scheduled to Future
Updated by Peter Amstutz about 2 months ago
- Target version changed from Future to Development 2024-10-09 sprint
Updated by Tom Clegg about 2 months ago
21314-cancel-invalid-mounts @ dc8508c26abe5db7758bff013ac163ed6c219b39 -- developer-run-tests: #4479
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- ✅
- runtime_status["error"] will look like: "error getting mounts from container record: json: cannot unmarshal array into Go struct field Container.mounts of type arvados.Mount"
- the new "cancel" behavior is triggered by the substring "json: cannot unmarshal" appearing in the error message -- since this is essentially working around an internal system problem, I figure we should be fairly conservative
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- the underlying bug is #21302 already linked
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- ✅ test case mangles the database content directly, and confirms the error message
- New or changed UX/UX and has gotten feedback from stakeholders.
- N/A
- Documentation has been updated.
- N/A
- Behaves appropriately at the intended scale (describe intended scale).
- N/A
- Considered backwards and forwards compatibility issues between client and server.
- N/A
- Follows our coding standards and GUI style guidelines.
- ✅
Updated by Lucas Di Pentima about 2 months ago
Code LGTM.
Just a thought: wouldn't this unmarshaling error also happen on the controller side? Or is it blindly forwarding what it gets from rails or the database?
Updated by Tom Clegg about 2 months ago
Just a thought: wouldn't this unmarshaling error also happen on the controller side? Or is it blindly forwarding what it gets from rails or the database?
Yes, currently controller is still just propagating the literal response from railsapi, not diverting to the "new code path" that decodes and rewrites (see source:lib/controller/handler.go). When we migrate that, this test will fail and (if we haven't fixed the underlying problem) we'll have to do something on the controller side as well.
Updated by Tom Clegg about 2 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|10a256819f6b2144a2353e198e72571e8f0922a8.