https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422017-03-07T15:39:33ZArvadosArvados - Bug #11190: Containers seem to run more than once, which isn't supposed to happenhttps://dev.arvados.org/issues/11190?journal_id=490232017-03-07T15:39:33ZTom Cleggtom@curii.com
<ul></ul><pre>
2017-03-01_17:27:35.24495 2017/03/01 17:27:35 Submitting container <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a> to slurm
2017-03-01_17:27:35.24517 2017/03/01 17:27:35 exec sbatch ["sbatch" "--share" "--workdir=/tmp" "--job-name=tb05z-dz642-eie1eal1059y9bb" "--mem-per-cpu=6250" "--cpus-per-task=8"]
2017-03-01_17:27:35.35069 2017/03/01 17:27:35 sbatch succeeded: "Submitted batch job 2948"
2017-03-01_17:27:35.35071 2017/03/01 17:27:35 Start monitoring container <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a>
2017-03-01_17:29:37.15184 2017/03/01 17:29:37 debug: runner is handling updates slowly, discarded previous update for <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a>
2017-03-01_17:29:42.32428 2017/03/01 17:29:42 debug: runner is handling updates slowly, discarded previous update for <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a>
2017-03-01_17:29:46.97205 2017/03/01 17:29:46 debug: runner is handling updates slowly, discarded previous update for <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a>
2017-03-01_17:29:51.83317 2017/03/01 17:29:51 debug: runner is handling updates slowly, discarded previous update for <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a>
2017-03-01_17:29:56.42094 2017/03/01 17:29:56 debug: runner is handling updates slowly, discarded previous update for <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a>
2017-03-01_17:29:57.89127 2017/03/01 17:29:57 Done monitoring container <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a>
2017-03-01_17:30:01.25862 2017/03/01 17:30:01 Submitting container <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a> to slurm
2017-03-01_17:30:01.25865 2017/03/01 17:30:01 exec sbatch ["sbatch" "--share" "--workdir=/tmp" "--job-name=tb05z-dz642-eie1eal1059y9bb" "--mem-per-cpu=6250" "--cpus-per-task=8"]
2017-03-01_17:30:01.32075 2017/03/01 17:30:01 sbatch succeeded: "Submitted batch job 2949"
2017-03-01_17:30:01.32077 2017/03/01 17:30:01 Start monitoring container <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a>
2017-03-01_17:30:06.85462 2017/03/01 17:30:06 Dispatcher says container <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a> is done: cancel slurm job
2017-03-01_17:30:07.23672 2017/03/01 17:30:07 container <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a> is still in squeue after scancel
2017-03-01_17:30:13.53918 2017/03/01 17:30:13 Done monitoring container <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a>
2017-03-01_17:31:02.73009 2017/03/01 17:31:02 Submitting container <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a> to slurm
2017-03-01_17:31:02.73013 2017/03/01 17:31:02 exec sbatch ["sbatch" "--share" "--workdir=/tmp" "--job-name=tb05z-dz642-eie1eal1059y9bb" "--mem-per-cpu=6250" "--cpus-per-task=8"]
2017-03-01_17:31:02.76251 2017/03/01 17:31:02 sbatch succeeded: "Submitted batch job 2950"
2017-03-01_17:31:02.76253 2017/03/01 17:31:02 Start monitoring container <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a>
2017-03-01_17:32:35.91008 2017/03/01 17:32:35 Done monitoring container <a href="https://arvadosapi.com/tb05z-dz642-eie1eal1059y9bb">tb05z-dz642-eie1eal1059y9bb</a>
</pre> Arvados - Bug #11190: Containers seem to run more than once, which isn't supposed to happenhttps://dev.arvados.org/issues/11190?journal_id=491682017-03-08T19:24:44ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Target version</strong> set to <i>2017-03-29 sprint</i></li></ul> Arvados - Bug #11190: Containers seem to run more than once, which isn't supposed to happenhttps://dev.arvados.org/issues/11190?journal_id=495912017-03-15T19:44:00ZTom Cleggtom@curii.com
<ul><li><strong>Category</strong> set to <i>Crunch</i></li><li><strong>Assigned To</strong> set to <i>Tom Clegg</i></li></ul> Arvados - Bug #11190: Containers seem to run more than once, which isn't supposed to happenhttps://dev.arvados.org/issues/11190?journal_id=501562017-03-29T19:09:47ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> changed from <i>2017-03-29 sprint</i> to <i>2017-04-12 sprint</i></li></ul> Arvados - Bug #11190: Containers seem to run more than once, which isn't supposed to happenhttps://dev.arvados.org/issues/11190?journal_id=502482017-03-30T17:18:35ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>I wonder if we should move the state transition to "Running" as soon as crunch-run has starting doing anything substantive. E.g. if it fails to load the Docker image, that shouldn't shouldn't put it back into Locked state, that should go Running->Cancelled.</p> Arvados - Bug #11190: Containers seem to run more than once, which isn't supposed to happenhttps://dev.arvados.org/issues/11190?journal_id=506862017-04-12T19:05:53ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> changed from <i>2017-04-12 sprint</i> to <i>2017-04-26 sprint</i></li></ul> Arvados - Bug #11190: Containers seem to run more than once, which isn't supposed to happenhttps://dev.arvados.org/issues/11190?journal_id=511252017-04-26T19:02:07ZTom Cleggtom@curii.com
<ul></ul><p>Allowing multiple dispatch attempts is a deliberate feature: when the dispatch/startup infrastructure fails early enough that it's absolutely certain the container has never been started, we don't count an "attempt" against a container request.</p>
<p>Currently there is no limit on the number of lock-attempt-unlock cycles, though. We should have a site-configurable limit. This counter doesn't have to be visible to anyone except the api server, although it would be useful to expose it to admin clients for troubleshooting purposes.</p> Arvados - Bug #11190: Containers seem to run more than once, which isn't supposed to happenhttps://dev.arvados.org/issues/11190?journal_id=511532017-04-26T19:14:57ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Target version</strong> changed from <i>2017-04-26 sprint</i> to <i>2017-05-10 sprint</i></li></ul> Arvados - Bug #11190: Containers seem to run more than once, which isn't supposed to happenhttps://dev.arvados.org/issues/11190?journal_id=511602017-04-26T19:17:02ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li></ul>