Keep manifest format » History » Version 8

Tom Clegg, 11/13/2018 09:35 PM

1 3 Tom Clegg
{{toc}}
2 3 Tom Clegg
3 1 Tom Clegg
h1. Keep manifest format
4 1 Tom Clegg
5 1 Tom Clegg
h2. Manifest v1
6 1 Tom Clegg
7 6 Tom Clegg
A manifest is utf-8 encoded text, consisting of zero or more newline-terminated streams.
8 1 Tom Clegg
9 1 Tom Clegg
Each stream consists of three or more space-delimited tokens:
10 5 Tom Clegg
* The first token is a stream name, consisting of one or more path components, delimited by @"/"@.
11 5 Tom Clegg
** The first path component is always @"."@.
12 5 Tom Clegg
** No path component is empty.
13 8 Tom Clegg
** No path component is "." or ".." (except the leading ".").
14 5 Tom Clegg
** The stream name never begins or ends with @"/"@.
15 7 Tom Clegg
* The second token is a data blob locator (see [[Keep locator format]]).
16 1 Tom Clegg
* ...possibly followed by more data blob locators...
17 1 Tom Clegg
* The first token that is not a block locator, and all subsequent tokens, are file tokens.
18 1 Tom Clegg
** A file token has three parts, delimited by @":"@: position, size, filename.
19 1 Tom Clegg
** Position and size are given in decimal, and are counted from the beginning of the first data blob.
20 1 Tom Clegg
** Filename may contain @"/"@ characters, but must not start or end with @"/"@, and must not contain @"//"@.
21 1 Tom Clegg
** Filename components (delimited by @"/"@) must not be @"."@ or @".."@.
22 8 Tom Clegg
** Except: Filename may be @"."@ if size is 0. This does not represent a real file; it is a placeholder used to ensure there is at least one file token in a stream that contains no files.
23 5 Tom Clegg
24 1 Tom Clegg
A manifest contains no TAB characters, nor other ASCII whitespace characters other than the spaces or newline delimiters specified above.
25 1 Tom Clegg
26 8 Tom Clegg
Whitespace, backslashes, and special characters appearing in paths and filenames are encoded as @\nnn@ where @nnn@ is a three-digit octal byte code.
27 8 Tom Clegg
* A backslash character is encoded as @\134@.
28 8 Tom Clegg
* A space is encoded as @\040@.
29 8 Tom Clegg
* It is permitted to escape printable characters: @"fo\157\057bar"@ and @"foo/bar"@ are equivalent.
30 8 Tom Clegg
31 1 Tom Clegg
A manifest always ends with a newline -- except the empty (zero-length) string, which is a valid manifest.
32 8 Tom Clegg
33 1 Tom Clegg
34 1 Tom Clegg
h2. Normalized manifest v1
35 1 Tom Clegg
36 1 Tom Clegg
A normalized manifest has the following additional restrictions.
37 1 Tom Clegg
* Streams are in alphanumeric order.
38 1 Tom Clegg
* Each stream name is unique within the manifest.
39 1 Tom Clegg
* Files within a stream are in alphanumeric order.
40 1 Tom Clegg
* -Concatenation @stream_name/filename@ is unique within the manifest.- (This can be impossible to accomplish without rewriting blobs.)
41 1 Tom Clegg
* Filename must not contain @"/"@.
42 1 Tom Clegg
43 1 Tom Clegg
An API call -exists- will exist to normalize a manifest.
44 1 Tom Clegg
45 1 Tom Clegg
Request:
46 1 Tom Clegg
* @POST /arvados/v1/collections/{hash}/normalize@
47 1 Tom Clegg
* request body: @{"collection":{"manifest_text":"...."}}@
48 1 Tom Clegg
49 1 Tom Clegg
Response:
50 1 Tom Clegg
* @{"uuid":"...","manifest_text":"..."}@
51 1 Tom Clegg
52 1 Tom Clegg
Notes:
53 1 Tom Clegg
* POST despite no side effects.
54 1 Tom Clegg
* Returns object with uuid even though no object was stored.
55 3 Tom Clegg
56 3 Tom Clegg
h2. Manifest v2
57 3 Tom Clegg
58 3 Tom Clegg
(Early design stages)
59 3 Tom Clegg
60 3 Tom Clegg
Should probably include:
61 3 Tom Clegg
* Structured format (JSON?)
62 3 Tom Clegg
* More than one level of indirection (e.g., manifest references block X, which references data blocks A,B,C)
63 3 Tom Clegg
* Specify hash algorithm with block hashes