Bug #21314
closed
a-d-c should cancel a container if it can't be loaded
Added by Tom Clegg 11 months ago.
Updated about 2 months ago.
Release relationship:
Auto
Description
If a container's "mounts" field is invalid, a-d-c logs this, and keeps trying.
{"ClusterID":"irdev","ContainerUUID":"xxxxx-dz642-xxxxxxxxxxxxxxx","PID":2037423,"error":"json: cannot unmarshal array into Go struct field Container.mounts of type arvados.Mount","level":"warning","msg":"error getting mounts","time":"2023-12-13T20:34:41.064140517Z"}
In this situation, the offending container should be cancelled.
It should put a message in the "error" runtime status to indicate something is wrong.
- Description updated (diff)
- Related to Bug #21302: Container created with corrupted mounts added
- Target version changed from To be scheduled to Future
- Target version changed from Future to Development 2024-10-09 sprint
- Description updated (diff)
- Status changed from New to In Progress
21314-cancel-invalid-mounts @ dc8508c26abe5db7758bff013ac163ed6c219b39 -- developer-run-tests: #4479
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- ✅
- runtime_status["error"] will look like: "error getting mounts from container record: json: cannot unmarshal array into Go struct field Container.mounts of type arvados.Mount"
- the new "cancel" behavior is triggered by the substring "json: cannot unmarshal" appearing in the error message -- since this is essentially working around an internal system problem, I figure we should be fairly conservative
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- the underlying bug is #21302 already linked
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- ✅ test case mangles the database content directly, and confirms the error message
- New or changed UX/UX and has gotten feedback from stakeholders.
- Documentation has been updated.
- Behaves appropriately at the intended scale (describe intended scale).
- Considered backwards and forwards compatibility issues between client and server.
- Follows our coding standards and GUI style guidelines.
Code LGTM.
Just a thought: wouldn't this unmarshaling error also happen on the controller side? Or is it blindly forwarding what it gets from rails or the database?
Just a thought: wouldn't this unmarshaling error also happen on the controller side? Or is it blindly forwarding what it gets from rails or the database?
Yes, currently controller is still just propagating the literal response from railsapi, not diverting to the "new code path" that decodes and rewrites (see source:lib/controller/handler.go). When we migrate that, this test will fail and (if we haven't fixed the underlying problem) we'll have to do something on the controller side as well.
- Status changed from In Progress to Resolved
Also available in: Atom
PDF