Project

General

Profile

Actions

Feature #7159

closed

[Keep] Implement an Azure blob storage volume in keepstore

Added by Brett Smith over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
0.5

Description

Functional requirements

  • You can run an Arvados cluster where all Keep blocks are stored as Azure blobs.
  • Keepstore accepts PUT requests and saves the block as an Azure blob. The response includes an X-Keep-Replicas-Stored header that returns the redundancy level of the blob.
    • Ideally this would be introspected from the storage account. If that's too difficult, it's okay to let the administrator set the redundancy level themselves. If that's too difficult, it's okay to hardcode 3 as the value, since that's the lowest redundancy level of any Azure blob.
  • Keepstore accepts and serves GET requests for blocks that are stored as Azure blobs.

Implementation

Write an Azure blob storage volume in keepstore.
  • Add an AzureBlobVolume type in azure_blob_volume.go.
  • Extend (*volumeSet)Set() to accept an argument like "azure-blob:XYZ" where XYZ is a container name.
  • Add an -azure-storage-connection-string flag that accepts a string argument and works like flagReadonly: i.e., it applies to all subsequent -volume azure-blob:XYZ arguments. If the argument starts with "/" or ".", use the first line of the given file, otherwise use the literal argument.

It should be possible to run keepstore with both azure and local storage devices enabled. (This might only be useful when one or the other is configured as read-only.)

Outstanding issues to investigate

  • We're assuming we can save and retrieve blobs using their checksum as their name. Are there any obstacles to this?
    • Seems fine according to MS docs. "acb/acbd1234..." is also an option.
  • Are there any limitations to the number of blobs that can be stored in a bucket? If so, keepstore needs to be able to find blocks across multiple buckets, and may need to have the capability to create buckets if the limit is low enough or we can't find a good predetermined division of buckets.
    • Seems fine. "An account can contain an unlimited number of containers. A container can store an unlimited number of blobs."
  • Are there performance characteristics like "container gets slow if you don't use some sort of namespacing", like ext4? I.e., should we name blobs "acb/acbd1234..." like we do in UnixVolume, or just "acbd1234..."?
    • listBlobsSegmentedWithPrefix seems to do exactly what IndexTo needs, which is handy.
  • How will we store "time of most recent PUT" timestamps? "setBlobProperties" seems relevant, but is "index" going to be unusably slow if we have to call getBlobProperties once per blob?
  • How will we resolve race conditions like "data manager deletes an old unreferenced block at the same time a client PUTs a new copy of it"? Currently we rely on flock(). "Lease" seems to be the relevant Azure feature.
  • Is "write a blob" guaranteed to be atomic (and never write a partial file) or do we still need the "write and rename into place" approach we use in UnixVolume?
Refs

Subtasks 6 (0 open6 closed)

Task #7484: Deal with race in CreateBlobResolvedTom Clegg08/28/2015Actions
Task #7501: Adjust Azure mtimes to local timeClosedTom Clegg08/28/2015Actions
Task #7500: Address golint complaintsResolvedTom Clegg10/09/2015Actions
Task #7416: Review 7159-empty-blob-raceResolvedPeter Amstutz08/28/2015Actions
Task #7417: Document how all the issues raised on 7159 were addressedResolvedTom Clegg08/28/2015Actions
Task #7536: Review 7159-clean-index (also fixes #7168)ResolvedPeter Amstutz08/28/2015Actions

Related issues

Related to Arvados - Bug #7161: [SDKs] PySDK supports any Keep service type, using proxy replication logic for non-disk typesResolvedRadhika Chippada09/23/2015Actions
Related to Arvados - Bug #7162: [SDKs] GoSDK supports any Keep service type, using proxy replication logic for non-disk typesResolvedRadhika Chippada08/28/2015Actions
Related to Arvados - Bug #7167: [Deployment] Write an efficient Keep migration scriptResolvedRadhika Chippada09/30/2015Actions
Related to Arvados - Idea #7179: [Keep] One set of keepstore volume tests should test all volume typesResolvedRadhika Chippada09/02/2015Actions
Blocks Arvados - Bug #7160: [Documentation] Document how to deploy Keep backed by Azure blob storageResolvedTom Clegg08/28/2015Actions
Blocked by Arvados - Idea #7241: [Keep] Prototype Azure blob storageResolvedTom Clegg09/23/2015Actions
Actions

Also available in: Atom PDF