[SDKs] [arv-mount] Use "." placeholder to persist empty directories
When saving a collection, encode an empty directory like this:
./dir/emptysubdir d41d8cd98f00b204e9800998ecf8427e+0 0:0:\056
#8 Updated by Lucas Di Pentima about 1 year ago
Updates at 56959ea8492ec4f08aa90c89a8f41e5c278e3c41 - branch
Test run: https://ci.curoverse.com/job/developer-run-tests/1013/
- Persist empty subdirectories by adding an empty file named
\056(".") to the manifest.
- Don't allow explicitly use that name on file or directory names.
- Test fixes & additions.
#9 Updated by Lucas Di Pentima about 1 year ago
Update 41e5e2b86 fixes
fuse tests: https://ci.curoverse.com/job/developer-run-tests/1015/
#10 Updated by Tom Clegg about 1 year ago
The "save empty dir" part LGTM but I would like the test to include an empty directory whose name needs escaping, like "/foo bar/baz waz".
This part doesn't seem right. (Matching a specific encoding like \056 is a clue encoding/decoding is not being done properly.)
diff --git a/sdk/python/arvados/collection.py b/sdk/python/arvados/collection.py index 627f0346d..c2517c618 100644 --- a/sdk/python/arvados/collection.py +++ b/sdk/python/arvados/collection.py @@ -600,6 +600,9 @@ class RichCollectionBase(CollectionBase): pathcomponents = path.split("/", 1) if pathcomponents: + # Don't allow naming files/dirs \\056 + if pathcomponents == "\\056": + raise IOError(errno.EINVAL, "Invalid name", pathcomponents) item = self._items.get(pathcomponents) if len(pathcomponents) == 1: if item is None:
Creating files and directories named '\056' should work fine. The manifest would look like this:
./\134056 d41d8cd98f00b204e9800998ecf8427e+0 0:0:\134056
The specific case of '\056' might be rare enough not to care about, but this seems to be revealing that something isn't escaping/unescaping backslashes properly. Indeed, a file named \040 seems to be written as \040 and then read back as " ", and this causes arv-mount to see its own updates as conflicts:
email@example.com:~/keepw/home/test$ touch '\040' firstname.lastname@example.org:~/keepw/home/test$ ls -l total 1 -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 \040 -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 ~20181218-163603~conflict~ email@example.com:~/keepw/home/test$ touch '\040' firstname.lastname@example.org:~/keepw/home/test$ touch '\040' email@example.com:~/keepw/home/test$ ls -l total 2 -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 \040 -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 ~20181218-163603~conflict~ -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 ~20181218-163644~conflict~ -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 ~20181218-163656~conflict~
We should fix that instead of disallowing files/dirs named \056.
#11 Updated by Lucas Di Pentima about 1 year ago
Ok, so after lots of testing and hair pulling, I think I got it to work. Squashed the many test commits by using
- Moved the escaping to
_normalize_stream(), doing it on
find_or_create()was not right.
- Removed code prohibiting to use '\056' as a file/dir name.
- Updated tests.
#12 Updated by Lucas Di Pentima about 1 year ago
-remainer tests re-run: https://ci.curoverse.com/job/developer-run-tests-remainder/1052/
#14 Updated by Tom Clegg about 1 year ago
A couple of follow-ups, though:
This comment in FuseRmTest is now obsolete:
# Can't have empty directories :-( so manifest will be empty.
Along with space and backslash, are there other chars that are legal in filenames but would be misinterpreted in a manifest? I suspect newline needs to be escaped, and we should probably do tab too (even if tab works unescaped, it could be distracting while troubleshooting).
#15 Updated by Lucas Di Pentima about 1 year ago
Update at 735143cbf
Test run: https://ci.curoverse.com/job/developer-run-tests/1032/
Added escaping for
\t on file and stream names. Just in case I tested the use of
: on file names and it seemed to me that it doesn't need escaping, no issues observed at the PySDK level.
#16 Updated by Tom Clegg about 1 year ago
escape() should always escape backslash as \134, even when it isn't part of a valid escape sequence. This implementation encodes filename r'\400' as r'0:0:\400', which contradicts Keep manifest format.
Although the Python SDK can read unescaped colons in filenames, escaping them seems like a good idea. It is explicitly allowed by Keep manifest format.
By the same logic, rather than blacklisting chars, we should probably escape ":", space, and all non-printable chars (something like
[:\000- ]). That would take care of \r, nul, etc.
#18 Updated by Lucas Di Pentima about 1 year ago
Updates at 8b2eb7d1f
Test run: https://ci.curoverse.com/job/developer-run-tests/1033/
Fixes literal backslash escaping, enhances other special chars escaping & updates test.
#20 Updated by Lucas Di Pentima about 1 year ago
Updates at a974fc22e
Test run: https://ci.curoverse.com/job/developer-run-tests/1034/
Sorry Tom, I've now updated
escape() to cover all the special chars between \000 and \040, and also ':'. Updated test too.
#21 Updated by Lucas Di Pentima about 1 year ago
Removed useless comments and simplified the regex on c5e7cbef5
Test run: https://ci.curoverse.com/job/developer-run-tests/1035/
#25 Updated by Peter Amstutz 12 months ago
crunch-job (crunchv1) was broken by these changes because manifests that previously looked like
Now have ":" quoted as \072
As a result, it wouldn't match crunch-jobs's regular expression for picking out the image id.
Here's a quick and dirty crunch-job fix:
14539-fix-crunch-job @ ade82492825d6b78e2da35822e80f79a03e6ea67