https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422014-09-04T13:48:06ZArvadosArvados - Idea #3795: [Crunch/SDKs] Tasks need more retry supporthttps://dev.arvados.org/issues/3795?journal_id=145972014-09-04T13:48:06ZWard Vandewegeward@curii.com
<ul><li><strong>Target version</strong> set to <i>Arvados Future Sprints</i></li></ul> Arvados - Idea #3795: [Crunch/SDKs] Tasks need more retry supporthttps://dev.arvados.org/issues/3795?journal_id=329002015-12-01T10:48:50ZJoshua Randalljr17@sanger.ac.uk
<ul></ul><p>I had 14/400 tasks fail today because of a problem with keep being overloaded ("Connection time-out" / "Operation too slow") in the middle of a run. The keepstore log was printing a lot of messages along the lines of "too many open files; retrying in…". I have now restarted that keepstore and everything seems ok now except that I have 14 failed tasks that I'd like to retry.</p>
<p>This seems to fall into a general category of unhandled system problems that could cause a temporary job failure, so it seems like this story might be able to address it in the long run (although I'm not sure by what mechanism the problem would actually be fixed, as it required restarting a backend keepstore).</p>
<p>I guess there should also be a manual way I can tell crunch that some tasks should be retried because as an admin I have corrected the (system) problem that caused them to fail?</p> Arvados - Idea #3795: [Crunch/SDKs] Tasks need more retry supporthttps://dev.arvados.org/issues/3795?journal_id=491272017-03-07T21:59:56ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Closed</i></li></ul> Arvados - Idea #3795: [Crunch/SDKs] Tasks need more retry supporthttps://dev.arvados.org/issues/3795?journal_id=492442017-03-09T21:43:18ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> deleted (<del><i>Arvados Future Sprints</i></del>)</li></ul>