Feature #19974
open
For both this and #19975, I feel like we need to grapple with and make the big decision about whether we want retry logic to be implemented server-side or client-side. Because I really would rather pick one, and implement one, then have a weird mix where some kinds of retries happen one place and others happen another. Personally I'm leaning towards client-side just because it's easier to implement a wider variety of retry strategies there, but I don't feel too strongly about it.
If we decide to go client-side, then the stories look more like:
- API container records have more information about a container's end state
- Crunch records that information in the API server
- arvados-cwl-runner recognizes various CWL extensions to provide different retry strategies. (This is multiple stories, one per strategy, and they're probably not all equally urgent.)
Putting this kind of retry logic in the client is my preference as well, among other things because deploying a new arvados-cwl-runner is much lighter weight than deploying a new API server.
- Related to Feature #19982: Ability to know when a container died because of spot instance reclamation and option to resubmit added
Also available in: Atom
PDF