Keep manifest format » History » Revision 4
Revision 3 (Tom Clegg, 03/02/2015 10:05 PM) → Revision 4/8 (Tom Clegg, 06/11/2015 06:09 PM)
{{toc}} h1. Keep manifest format h2. Manifest v1 A manifest is utf-8 encoded text, consisting of one or more newline-terminated streams. Each stream consists of three or more space-delimited tokens: * The first token is a stream name, consisting of one or more tokens, delimited by @"/"@, the first of which is always @"."@ and none of which are empty. The stream name never begins or ends with @"/"@. @"."@. * The second token is a data blob locator, consisting of one or more tokens, delimited by @"+"@, the first of which is an MD5 hexdigest. ** If a subsequent token ("hint") in the locator is numeric, it indicates the size of the data blob, in bytes. ** If a hint starts with @"A"@, it is an authorization token (used by the Keep server to confirm that the block is readable by a specific API auth token). * ...possibly followed by more data blob locators... * The first token that is not a block locator, and all subsequent tokens, are file tokens. ** A file token has three parts, delimited by @":"@: position, size, filename. ** Position and size are given in decimal, and are counted from the beginning of the first data blob. ** Filename may contain @"/"@ characters, but must not start or end with @"/"@, and must not contain @"//"@. h2. Normalized manifest v1 A normalized manifest has the following additional restrictions. * Streams are in alphanumeric order. * Each stream name is unique within the manifest. * Files within a stream are in alphanumeric order. * -Concatenation @stream_name/filename@ is unique within the manifest.- (This can be impossible to accomplish without rewriting blobs.) * Filename must not contain @"/"@. An API call -exists- will exist to normalize a manifest. Request: * @POST /arvados/v1/collections/{hash}/normalize@ * request body: @{"collection":{"manifest_text":"...."}}@ Response: * @{"uuid":"...","manifest_text":"..."}@ Notes: * POST despite no side effects. * Returns object with uuid even though no object was stored. h2. Manifest v2 (Early design stages) Should probably include: * Structured format (JSON?) * More than one level of indirection (e.g., manifest references block X, which references data blocks A,B,C) * Specify hash algorithm with block hashes