Keep manifest format » History » Version 7

Version 6 (Tom Clegg, 06/13/2015 05:06 AM) → Version 7/8 (Tom Clegg, 06/01/2016 02:19 PM)


h1. Keep manifest format

h2. Manifest v1

A manifest is utf-8 encoded text, consisting of zero or more newline-terminated streams.

Each stream consists of three or more space-delimited tokens:
* The first token is a stream name, consisting of one or more path components, delimited by @"/"@.
** The first path component is always @"."@.
** No path component is empty.
** No path component is "." or "..".
** The stream name never begins or ends with @"/"@.
* The second token is a data blob locator, consisting of one or more tokens, delimited by @"+"@, the first of which is an MD5 hexdigest.
** If a subsequent token ("hint") in the
locator (see [[Keep locator format]]). is numeric, it indicates the size of the data blob, in bytes.
** If a hint starts with @"A"@, it is an authorization token (used by the Keep server to confirm that the block is readable by a specific API auth token).

* ...possibly followed by more data blob locators...
* The first token that is not a block locator, and all subsequent tokens, are file tokens.
** A file token has three parts, delimited by @":"@: position, size, filename.
** Position and size are given in decimal, and are counted from the beginning of the first data blob.
** Filename may contain @"/"@ characters, but must not start or end with @"/"@, and must not contain @"//"@.
** Filename components (delimited by @"/"@) must not be @"."@ or @".."@.

A manifest contains no TAB characters, nor other ASCII whitespace characters other than the spaces or newline delimiters specified above.

A manifest always ends with a newline -- except the empty (zero-length) string, which is a valid manifest.

h2. Normalized manifest v1

A normalized manifest has the following additional restrictions.
* Streams are in alphanumeric order.
* Each stream name is unique within the manifest.
* Files within a stream are in alphanumeric order.
* -Concatenation @stream_name/filename@ is unique within the manifest.- (This can be impossible to accomplish without rewriting blobs.)
* Filename must not contain @"/"@.

An API call -exists- will exist to normalize a manifest.

* @POST /arvados/v1/collections/{hash}/normalize@
* request body: @{"collection":{"manifest_text":"...."}}@

* @{"uuid":"...","manifest_text":"..."}@

* POST despite no side effects.
* Returns object with uuid even though no object was stored.

h2. Manifest v2

(Early design stages)

Should probably include:
* Structured format (JSON?)
* More than one level of indirection (e.g., manifest references block X, which references data blocks A,B,C)
* Specify hash algorithm with block hashes