https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422014-11-22T07:41:23ZArvadosArvados - Bug #4661: [SDKs] Python Keep client's retry/rescue should not make an OOM exception look like a Keep problemhttps://dev.arvados.org/issues/4661?journal_id=184802014-11-22T07:41:23ZTom Cleggtom@curii.com
<ul><li><strong>Category</strong> set to <i>SDKs</i></li></ul><p>Example: <a href="http://curover.se/9tee4-8i9sb-n2nvt7slia8m0im" class="external"><a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a></a></p>
<p>The real problem was that Python ran out of memory while trying to write to Keep (the same job failed several times with max_tasks=20 and succeeded with max_tasks=5) but the log makes it look like a Keep problem. If the "wanted 2 but wrote 1" message propagates the error that caused the second write to fail, this sort of problem should be much easier to diagnose.</p>
<pre>
2014-11-21_23:02:43 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr Traceback (most recent call last):
2014-11-21_23:02:43 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr File "/tmp/crunch-src/crunch_scripts/addRefMemEff.py", line 143, in <module>
2014-11-21_23:02:43 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr output_id = out.finish()
2014-11-21_23:02:43 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 538, in finish
2014-11-21_23:02:43 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr return self._my_keep().put(self.manifest_text())
2014-11-21_23:02:43 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 157, in num_retries_setter
2014-11-21_23:02:43 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr return orig_func(self, *args, **kwargs)
2014-11-21_23:02:43 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 719, in put
2014-11-21_23:02:43 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr (data_hash, copies, thread_limiter.done()))
2014-11-21_23:02:43 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr arvados.errors.KeepWriteError: Write fail for 4ff205e7317925f2f92ee4a7c8bb8980: wanted 2 but wrote 1
2014-11-21_23:02:44 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr 2014/11/21 23:05:24 Error response from daemon: Cannot destroy container 4740a8f66ce2a16550e98b480b3bbb3da997ae3391c2c39d40e873190fa0c898: Driver aufs failed to remove root filesystem 4740a8f66ce2a16550e98b480b3bbb3da997ae3391c2c39d40e873190fa0c898: rename /tmp/docker/aufs/diff/4740a8f66ce2a16550e98b480b3bbb3da997ae3391c2c39d40e873190fa0c898 /tmp/docker/aufs/diff/4740a8f66ce2a16550e98b480b3bbb3da997ae3391c2c39d40e873190fa0c898-removing: device or resource busy
2014-11-21_23:02:45 <a href="https://arvadosapi.com/9tee4-8i9sb-n2nvt7slia8m0im">9tee4-8i9sb-n2nvt7slia8m0im</a> 2705 13 stderr srun: error: compute0: task 0: Exited with exit code 1
</pre>
<p>As an aside: keepstore logged 500, probably because it didn't receive the entire data block. Unfortunately it doesn't currently log an error message, just the HTTP status code and the number of bytes in the response. It would be nice to fix that too.</p> Arvados - Bug #4661: [SDKs] Python Keep client's retry/rescue should not make an OOM exception look like a Keep problemhttps://dev.arvados.org/issues/4661?journal_id=189532014-12-09T18:44:15ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> deleted (<del><i>Bug Triage</i></del>)</li></ul> Arvados - Bug #4661: [SDKs] Python Keep client's retry/rescue should not make an OOM exception look like a Keep problemhttps://dev.arvados.org/issues/4661?journal_id=189542014-12-09T18:49:04ZTom Cleggtom@curii.com
<ul><li><strong>Story points</strong> deleted (<del><i>0.5</i></del>)</li></ul> Arvados - Bug #4661: [SDKs] Python Keep client's retry/rescue should not make an OOM exception look like a Keep problemhttps://dev.arvados.org/issues/4661?journal_id=200122015-01-15T17:05:12ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li></ul><p>Applied in changeset arvados|commit:952bfa87465a27f83dca7feca7d369fda4200eb5.</p>