https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422015-08-28T20:32:36ZArvadosArvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=295522015-08-28T20:32:36ZBrett Smithbrett.smith@curii.com
<ul></ul><p>Libraries that might help with this:</p>
<ul>
<li><a class="external" href="https://github.com/Azure/azure-sdk-for-go">https://github.com/Azure/azure-sdk-for-go</a> (Apache 2)</li>
<li><a class="external" href="https://github.com/loldesign/azure">https://github.com/loldesign/azure</a> (MIT)</li>
<li><a class="external" href="https://launchpad.net/gwacl">https://launchpad.net/gwacl</a> (LGPL v3)</li>
</ul> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=295532015-08-29T02:41:13ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/29553/diff?detail_id=28941">diff</a>)</li></ul> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=296222015-09-01T18:59:17ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/29622/diff?detail_id=29000">diff</a>)</li></ul> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=297002015-09-02T19:43:48ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Target version</strong> changed from <i>Arvados Future Sprints</i> to <i>2015-09-30 sprint</i></li></ul> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=297592015-09-04T17:41:33ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Azure storage computes content MD5 and provides it base64 encoded (but I don't think you can query by it)</p> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=299132015-09-08T20:25:18ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Target version</strong> deleted (<del><i>2015-09-30 sprint</i></del>)</li></ul> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=306232015-09-29T15:33:56ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Target version</strong> set to <i>Arvados Future Sprints</i></li></ul> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=307202015-09-30T18:15:56ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Target version</strong> changed from <i>Arvados Future Sprints</i> to <i>2015-10-14 sprint</i></li></ul> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=307352015-09-30T19:32:29ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Tom Clegg</i></li><li><strong>Story points</strong> set to <i>0.5</i></li></ul><p>The actual implementation was effectively done in the prototype story <a class="issue tracker-6 status-3 priority-4 priority-default closed parent" title="Idea: [Keep] Prototype Azure blob storage (Resolved)" href="https://dev.arvados.org/issues/7241">#7241</a>, because that went very well. For this sprint, please go over the bullet points, make sure everything is addressed, and make notes as appropriate as to how those issues were addressed. It will likely come in handy when we support additional object stores in the future.</p> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=310052015-10-05T20:39:11ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><ul>
<li>How will we store "time of most recent PUT" timestamps? "setBlobProperties" seems relevant, but is "index" going to be unusably slow if we have to call getBlobProperties once per blob?
<ul>
<li>The Azure index returns last modified time in the blob properties, so additional API calls are not needed</li>
</ul>
</li>
<li>How will we resolve race conditions like "data manager deletes an old unreferenced block at the same time a client PUTs a new copy of it"? Currently we rely on flock(). "Lease" seems to be the relevant Azure feature.
<ul>
<li>Conditional delete using the etag of the most recent version, if the block is written either it will update the etag (so the conditional delete fails) or the delete will happen first (and the block is rewritten)</li>
</ul>
</li>
<li>Is "write a blob" guaranteed to be atomic (and never write a partial file) or do we still need the "write and rename into place" approach we use in UnixVolume?
<ul>
<li>The "commit block list" is atomic, we're using a PUT of 64 MiB which is presumably using commit block list underneath.</li>
<li>However, I'm not 100% confident that there isn't a race condition in which a zero-length blob is visible for a tiny slice of time.</li>
</ul></li>
</ul> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=311362015-10-08T18:13:16ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=311372015-10-08T18:27:57ZTom Cleggtom@curii.com
<ul></ul><blockquote>
<p>Is "write a blob" guaranteed to be atomic (and never write a partial file) or do we still need the "write and rename into place" approach we use in UnixVolume?</p>
</blockquote>
Testing with 0.1.20151002220939.f81f84e confirms <code>CreateBlob()</code> is not atomic: while goroutine A is waiting for <code>CreateBlob("foo", "bar")</code> to return, if goroutine B calls <code>GetBlob("foo")</code>, it might get:
<ul>
<li>Error: block does not exist</li>
<li><code>""</code> (oops, CreateBlob is not atomic)</li>
<li><code>"bar"</code></li>
</ul>
<p>So far, no evidence that Commit has a race: i.e., we don't see partial data like <code>"b"</code> or <code>"ba"</code>.</p>
<p>Workaround in 7159-empty-blob-race: When getting a blob, if it's empty, pause 1s and retry until it is 15s old before returning it.</p>
<p>Note: 1 second is a long time if the block is small, but this may be fine. The obvious reason why this race would be common is that clients launch two writer threads at once to write the same block, and a single container might be shared by multiple keepstore servers. With this workaround, one thread will run fast and the other will wait 1 second. The one that runs fast will tell the client it has written 3 copies, and the client will abandon the thread that's waiting 1 second.</p> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=311792015-10-09T20:14:01ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Reviewing 7159-empty-blob-race:</p>
<ul>
<li>Because the current configuration limits the entire transfer time to 20 seconds, a 15 second window to wait for a block to show up will only leave 5 seconds to transfer the entire block. That seems like it could be a problem.</li>
<li>Do you have a sense for how long the race window is? Does the empty blob appear at the beginning of the transfer?</li>
</ul> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=311812015-10-09T20:38:07ZTom Cleggtom@curii.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<p>Reviewing 7159-empty-blob-race:</p>
<ul>
<li>Because the current configuration limits the entire transfer time to 20 seconds, a 15 second window to wait for a block to show up will only leave 5 seconds to transfer the entire block. That seems like it could be a problem.</li>
</ul>
</blockquote>
<p>True. I don't think there's any way around this, other than "keep services should come with recommended timeout". Until then, fwiw 15 seconds is the <em>maximum</em> time we're willing to wait, but the actual time we wait is (at most) the actual race window plus 1 second granularity.</p>
<blockquote>
<ul>
<li>Do you have a sense for how long the race window is? Does the empty blob appear at the beginning of the transfer?</li>
</ul>
</blockquote>
<p>Not really. Probably a good idea to add logging here so we can find out...</p> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=312532015-10-13T20:26:45ZTom Cleggtom@curii.com
<ul></ul><a name="Resolved-issues"></a>
<h2 >Resolved issues<a href="#Resolved-issues" class="wiki-anchor">¶</a></h2>
<blockquote>
<p>Are there performance characteristics like "container gets slow if you don't use some sort of namespacing", like ext4? I.e., should we name blobs "acb/acbd1234..." like we do in UnixVolume, or just "acbd1234..."?</p>
</blockquote>
<p>Blocks are named just <code>"acbd18db4cc2f85cedef654fccc4a4d8"</code>, with no prefix like <code>"acb/acbd18..."</code>. Judging by the API and the docs, there is nothing special about the <code>"/"</code> character so we're assuming there's no performance benefit to <code>"acb/acbd18..."</code> like there is in POSIX filesystems.</p>
<p>The Azure API has a "list blobs with given prefix" which allows us to implement our own "index with prefix" efficiently without a directory-like structure.</p>
<blockquote>
<p>number of blobs that can be stored in a <del>bucket</del> container</p>
</blockquote>
<p>Docs say: "An account can contain an unlimited number of containers. A container can store an unlimited number of blobs."</p>
<blockquote>
<p>is "index" going to be unusably slow if we have to call getBlobProperties once per blob?</p>
</blockquote>
<p>The ListBlobs API response already includes the Properties for each blob, so no, we don't need an extra API call per listed blob.</p>
<blockquote>
<p>Is "write a blob" guaranteed to be atomic (and never write a partial file) or do we still need the "write and rename into place" approach we use in UnixVolume?</p>
</blockquote>
There is an atomic commit operation, but we haven't found an atomic create+commit. The create+commit API seems to consist of an atomic create followed by an atomic commit.
<ul>
<li>It's common to see an empty blob while a create+commit API call is in progress.</li>
<li>If create+commit fails, the blob gets deleted.</li>
</ul>
<p>The Azure documentation claims the API obeys <code>"If-Unmodified-Since"</code> request headers, but the API doesn't appear to do so. See "if false" part of test case: <a class="external" href="https://github.com/curoverse/azure-sdk-for-go/blob/886dc43df1112ee01f61f75dc4211fb898c04339/storage/blob_test.go#L324-L335">https://github.com/curoverse/azure-sdk-for-go/blob/886dc43df1112ee01f61f75dc4211fb898c04339/storage/blob_test.go#L324-L335</a> Our workaround uses "If-Match" (which really is obeyed by Azure) during Delete, relying on Touch to change Etag by storing the current time in blob metadata.</p>
<a name="Outstanding-issues"></a>
<h2 >Outstanding issues<a href="#Outstanding-issues" class="wiki-anchor">¶</a></h2>
<a name="Race"></a>
<h3 >Race<a href="#Race" class="wiki-anchor">¶</a></h3>
Confirm that data cannot be lost in the following race:
<ul>
<li>A sees no existing data, starts a create+commit</li>
<li>B sees no existing data, starts a create+commit</li>
<li>A's create+commit succeeds, A returns success to caller</li>
<li>B's create+commit fails between create and commit (e.g., network error in transit)</li>
</ul>
<p>In this kind of race, if A has already had a chance to return success to caller, B <strong>must not</strong> truncate or delete the blob. Currently we rely on Azure not to hit "delete" to clean up B's transaction (as it normally would if upload fails) if someone else has committed data to the blob in the meantime -- regardless of whether it was A or B that succeeded in the "create" call.</p>
<a name="Timeouts"></a>
<h3 >Timeouts<a href="#Timeouts" class="wiki-anchor">¶</a></h3>
<p>If two writers send the same data block to a container (through the same keepstore services or not) one of them will win the race and just do a "create+commit" operation. The loser will wait for the winner to finish committing, then read back the block to make sure the block data (not just the hash) is equal. This means the losing client waits for two transfers (write + read), and is therefore more likely to reach its (default 20 second) timeout.</p>
<a name="Conditional-delete"></a>
<h3 >Conditional delete<a href="#Conditional-delete" class="wiki-anchor">¶</a></h3>
<p>We use a fork of the official Azure Go SDK because the official version doesn't support conditional delete using the "If-Match" header. There is an <a href="https://github.com/Azure/azure-sdk-for-go/issues/209" class="external">open issue</a> tagged "help wanted". Once someone (us?) adds support to the official SDK and we upgrade to Go1.5, we should consider using Go1.5's vendor experiment instead of importing from <a href="https://github.com/curoverse/azure-sdk-for-go" class="external">our own github fork</a>.</p> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=312832015-10-14T16:15:06ZTom Cleggtom@curii.com
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Feature</i></li></ul><p>Here are some performance numbers obtained with keepexercise @ <a class="changeset" title="7410: Add keepexercise" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/c8b37d59c47da700a983569c7689b178da210918">c8b37d5</a>. Note: testing was not especially rigorous.</p>
<table>
<tr>
<td>Platform</td>
<td>Volume</td>
<td>Runtime </td>
<td>Reading threads</td>
<td>Writing threads</td>
<td>#different blocks</td>
<td>Net read MiB/s</td>
<td>Net write MiB/s</td>
</tr>
<tr>
<td>AWS </td>
<td>xfs </td>
<td>1m30s </td>
<td>16 </td>
<td>4 </td>
<td>1 </td>
<td>20.6 </td>
<td>6.4 </td>
</tr>
<tr>
<td>AWS </td>
<td>xfs </td>
<td>1m30s </td>
<td>16 </td>
<td>4 </td>
<td>4 </td>
<td>18.5 </td>
<td>7.8 </td>
</tr>
<tr>
<td>AWS </td>
<td>xfs </td>
<td>2m </td>
<td>4 </td>
<td>16 </td>
<td>16 </td>
<td>6.9 </td>
<td>9.6 </td>
</tr>
<tr>
<td>Azure </td>
<td>xfs </td>
<td>10m </td>
<td>16 </td>
<td>4 </td>
<td>1 </td>
<td>92.2 </td>
<td>46.5 </td>
</tr>
<tr>
<td>Azure </td>
<td>xfs </td>
<td>2m </td>
<td>16 </td>
<td>4 </td>
<td>4 </td>
<td>92.5 </td>
<td>52.1 </td>
</tr>
<tr>
<td>Azure </td>
<td>xfs </td>
<td>5m </td>
<td>4 </td>
<td>16 </td>
<td>16 </td>
<td>43.9 </td>
<td>55.5 </td>
</tr>
<tr>
<td>Azure </td>
<td>Blob </td>
<td>7m </td>
<td>16 </td>
<td>4 </td>
<td>1 </td>
<td>68.1 </td>
<td>15.8 </td>
</tr>
<tr>
<td>Azure </td>
<td>Blob </td>
<td>3m </td>
<td>16 </td>
<td>4 </td>
<td>4 </td>
<td>78.9 </td>
<td>23.1 </td>
</tr>
<tr>
<td>Azure </td>
<td>Blob </td>
<td>3m </td>
<td>4 </td>
<td>16 </td>
<td>16 </td>
<td>22.8 </td>
<td>51.2 </td>
</tr>
</table> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=312872015-10-14T17:47:47ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>7159-clean-index LGTM</p> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=312912015-10-14T17:54:27ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Tom Clegg wrote:</p>
<blockquote>
<p>Here are some performance numbers obtained with keepexercise @ <a class="changeset" title="7410: Add keepexercise" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/c8b37d59c47da700a983569c7689b178da210918">c8b37d5</a>. Note: testing was not especially rigorous.</p>
<table>
<tr>
<td>Platform</td>
<td>Volume</td>
<td>Runtime </td>
<td>Reading threads</td>
<td>Writing threads</td>
<td>#different blocks</td>
<td>Net read MiB/s</td>
<td>Net write MiB/s</td>
</tr>
<tr>
<td>AWS </td>
<td>xfs </td>
<td>1m30s </td>
<td>16 </td>
<td>4 </td>
<td>1 </td>
<td>20.6 </td>
<td>6.4 </td>
</tr>
<tr>
<td>AWS </td>
<td>xfs </td>
<td>1m30s </td>
<td>16 </td>
<td>4 </td>
<td>4 </td>
<td>18.5 </td>
<td>7.8 </td>
</tr>
<tr>
<td>AWS </td>
<td>xfs </td>
<td>2m </td>
<td>4 </td>
<td>16 </td>
<td>16 </td>
<td>6.9 </td>
<td>9.6 </td>
</tr>
<tr>
<td>Azure </td>
<td>xfs </td>
<td>10m </td>
<td>16 </td>
<td>4 </td>
<td>1 </td>
<td>92.2 </td>
<td>46.5 </td>
</tr>
<tr>
<td>Azure </td>
<td>xfs </td>
<td>2m </td>
<td>16 </td>
<td>4 </td>
<td>4 </td>
<td>92.5 </td>
<td>52.1 </td>
</tr>
<tr>
<td>Azure </td>
<td>xfs </td>
<td>5m </td>
<td>4 </td>
<td>16 </td>
<td>16 </td>
<td>43.9 </td>
<td>55.5 </td>
</tr>
<tr>
<td>Azure </td>
<td>Blob </td>
<td>7m </td>
<td>16 </td>
<td>4 </td>
<td>1 </td>
<td>68.1 </td>
<td>15.8 </td>
</tr>
<tr>
<td>Azure </td>
<td>Blob </td>
<td>3m </td>
<td>16 </td>
<td>4 </td>
<td>4 </td>
<td>78.9 </td>
<td>23.1 </td>
</tr>
<tr>
<td>Azure </td>
<td>Blob </td>
<td>3m </td>
<td>4 </td>
<td>16 </td>
<td>16 </td>
<td>22.8 </td>
<td>51.2 </td>
</tr>
</table>
</blockquote>
<p>I'm confused. How is it that AWS has the shortest Runtime but also the lowest transfer rate?</p> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=312952015-10-14T18:50:07ZTom Cleggtom@curii.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<p>I'm confused. How is it that AWS has the shortest Runtime but also the lowest transfer rate?</p>
</blockquote>
<p>The test stopped when I happened to notice the transfer rate was more or less stable (or just decided the test wasn't getting any more interesting) and pressed ^C -- there wasn't a "total bytes transferred" target or anything like that.</p> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=313572015-10-15T14:10:32ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/31357/diff?detail_id=30783">diff</a>)</li></ul> Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstorehttps://dev.arvados.org/issues/7159?journal_id=313622015-10-15T14:29:46ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul>