Project

General

Profile

Keep manifest format » History » Revision 6

Revision 5 (Tom Clegg, 06/13/2015 05:03 AM) → Revision 6/8 (Tom Clegg, 06/13/2015 05:06 AM)

{{toc}} 

 h1. Keep manifest format 

 h2. Manifest v1 

 A manifest is utf-8 encoded text, consisting of zero one or more newline-terminated streams. 

 Each stream consists of three or more space-delimited tokens: 
 * The first token is a stream name, consisting of one or more path components, delimited by @"/"@. 
 ** The first path component is always @"."@. 
 ** No path component is empty. 
 ** No path component is "." or "..". 
 ** The stream name never begins or ends with @"/"@. 
 * The second token is a data blob locator, consisting of one or more tokens, delimited by @"+"@, the first of which is an MD5 hexdigest. 
 ** If a subsequent token ("hint") in the locator is numeric, it indicates the size of the data blob, in bytes. 
 ** If a hint starts with @"A"@, it is an authorization token (used by the Keep server to confirm that the block is readable by a specific API auth token). 
 * ...possibly followed by more data blob locators... 
 * The first token that is not a block locator, and all subsequent tokens, are file tokens. 
 ** A file token has three parts, delimited by @":"@: position, size, filename. 
 ** Position and size are given in decimal, and are counted from the beginning of the first data blob. 
 ** Filename may contain @"/"@ characters, but must not start or end with @"/"@, and must not contain @"//"@. 
 ** Filename components (delimited by @"/"@) must not be @"."@ or @".."@. 

 A manifest contains no TAB characters, nor other ASCII whitespace characters other than the spaces or newline delimiters specified above. 

 A manifest always ends with a newline -- except the empty (zero-length) string, which is a valid manifest. 

 h2. Normalized manifest v1 

 A normalized manifest has the following additional restrictions. 
 * Streams are in alphanumeric order. 
 * Each stream name is unique within the manifest. 
 * Files within a stream are in alphanumeric order. 
 * -Concatenation @stream_name/filename@ is unique within the manifest.- (This can be impossible to accomplish without rewriting blobs.) 
 * Filename must not contain @"/"@. 

 An API call -exists- will exist to normalize a manifest. 

 Request: 
 * @POST /arvados/v1/collections/{hash}/normalize@ 
 * request body: @{"collection":{"manifest_text":"...."}}@ 

 Response: 
 * @{"uuid":"...","manifest_text":"..."}@ 

 Notes: 
 * POST despite no side effects. 
 * Returns object with uuid even though no object was stored. 

 h2. Manifest v2 

 (Early design stages) 

 Should probably include: 
 * Structured format (JSON?) 
 * More than one level of indirection (e.g., manifest references block X, which references data blocks A,B,C) 
 * Specify hash algorithm with block hashes