Keep manifest format » History » Version 5

« Previous - Version 5/8 (diff) - Next » - Current version
Tom Clegg, 06/13/2015 05:03 AM

Keep manifest format

Manifest v1

A manifest is utf-8 encoded text, consisting of one or more newline-terminated streams.

Each stream consists of three or more space-delimited tokens:
  • The first token is a stream name, consisting of one or more path components, delimited by "/".
    • The first path component is always ".".
    • No path component is empty.
    • No path component is "." or "..".
    • The stream name never begins or ends with "/".
  • The second token is a data blob locator, consisting of one or more tokens, delimited by "+", the first of which is an MD5 hexdigest.
    • If a subsequent token ("hint") in the locator is numeric, it indicates the size of the data blob, in bytes.
    • If a hint starts with "A", it is an authorization token (used by the Keep server to confirm that the block is readable by a specific API auth token).
  • ...possibly followed by more data blob locators...
  • The first token that is not a block locator, and all subsequent tokens, are file tokens.
    • A file token has three parts, delimited by ":": position, size, filename.
    • Position and size are given in decimal, and are counted from the beginning of the first data blob.
    • Filename may contain "/" characters, but must not start or end with "/", and must not contain "//".
    • Filename components (delimited by "/") must not be "." or "..".

A manifest contains no TAB characters, nor other ASCII whitespace characters other than the spaces or newline delimiters specified above.

Normalized manifest v1

A normalized manifest has the following additional restrictions.
  • Streams are in alphanumeric order.
  • Each stream name is unique within the manifest.
  • Files within a stream are in alphanumeric order.
  • Concatenation stream_name/filename is unique within the manifest. (This can be impossible to accomplish without rewriting blobs.)
  • Filename must not contain "/".

An API call exists will exist to normalize a manifest.

  • POST /arvados/v1/collections/{hash}/normalize
  • request body: {"collection":{"manifest_text":"...."}}
  • {"uuid":"...","manifest_text":"..."}
  • POST despite no side effects.
  • Returns object with uuid even though no object was stored.

Manifest v2

(Early design stages)

Should probably include:
  • Structured format (JSON?)
  • More than one level of indirection (e.g., manifest references block X, which references data blocks A,B,C)
  • Specify hash algorithm with block hashes