[SDKs] [arv-mount] Use "." placeholder to persist empty directories
When saving a collection, encode an empty directory like this:
./dir/emptysubdir d41d8cd98f00b204e9800998ecf8427e+0 0:0:\056
#8 Updated by Lucas Di Pentima 8 months ago
Updates at 56959ea8492ec4f08aa90c89a8f41e5c278e3c41 - branch
Test run: https://ci.curoverse.com/job/developer-run-tests/1013/
- Persist empty subdirectories by adding an empty file named
\056(".") to the manifest.
- Don't allow explicitly use that name on file or directory names.
- Test fixes & additions.
#9 Updated by Lucas Di Pentima 8 months ago
Update 41e5e2b86 fixes
fuse tests: https://ci.curoverse.com/job/developer-run-tests/1015/
The "save empty dir" part LGTM but I would like the test to include an empty directory whose name needs escaping, like "/foo bar/baz waz".
This part doesn't seem right. (Matching a specific encoding like \056 is a clue encoding/decoding is not being done properly.)
diff --git a/sdk/python/arvados/collection.py b/sdk/python/arvados/collection.py index 627f0346d..c2517c618 100644 --- a/sdk/python/arvados/collection.py +++ b/sdk/python/arvados/collection.py @@ -600,6 +600,9 @@ class RichCollectionBase(CollectionBase): pathcomponents = path.split("/", 1) if pathcomponents: + # Don't allow naming files/dirs \\056 + if pathcomponents == "\\056": + raise IOError(errno.EINVAL, "Invalid name", pathcomponents) item = self._items.get(pathcomponents) if len(pathcomponents) == 1: if item is None:
Creating files and directories named '\056' should work fine. The manifest would look like this:
./\134056 d41d8cd98f00b204e9800998ecf8427e+0 0:0:\134056
The specific case of '\056' might be rare enough not to care about, but this seems to be revealing that something isn't escaping/unescaping backslashes properly. Indeed, a file named \040 seems to be written as \040 and then read back as " ", and this causes arv-mount to see its own updates as conflicts:
firstname.lastname@example.org:~/keepw/home/test$ touch '\040' email@example.com:~/keepw/home/test$ ls -l total 1 -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 \040 -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 ~20181218-163603~conflict~ firstname.lastname@example.org:~/keepw/home/test$ touch '\040' email@example.com:~/keepw/home/test$ touch '\040' firstname.lastname@example.org:~/keepw/home/test$ ls -l total 2 -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 \040 -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 ~20181218-163603~conflict~ -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 ~20181218-163644~conflict~ -rwxrwxrwx 1 tomclegg tomclegg 0 Dec 18 16:35 ~20181218-163656~conflict~
We should fix that instead of disallowing files/dirs named \056.
#11 Updated by Lucas Di Pentima 8 months ago
Ok, so after lots of testing and hair pulling, I think I got it to work. Squashed the many test commits by using
- Moved the escaping to
_normalize_stream(), doing it on
find_or_create()was not right.
- Removed code prohibiting to use '\056' as a file/dir name.
- Updated tests.
#12 Updated by Lucas Di Pentima 8 months ago
-remainer tests re-run: https://ci.curoverse.com/job/developer-run-tests-remainder/1052/
A couple of follow-ups, though:
This comment in FuseRmTest is now obsolete:
# Can't have empty directories :-( so manifest will be empty.
Along with space and backslash, are there other chars that are legal in filenames but would be misinterpreted in a manifest? I suspect newline needs to be escaped, and we should probably do tab too (even if tab works unescaped, it could be distracting while troubleshooting).
#15 Updated by Lucas Di Pentima 7 months ago
Update at 735143cbf
Test run: https://ci.curoverse.com/job/developer-run-tests/1032/
Added escaping for
\t on file and stream names. Just in case I tested the use of
: on file names and it seemed to me that it doesn't need escaping, no issues observed at the PySDK level.
escape() should always escape backslash as \134, even when it isn't part of a valid escape sequence. This implementation encodes filename r'\400' as r'0:0:\400', which contradicts Keep manifest format.
Although the Python SDK can read unescaped colons in filenames, escaping them seems like a good idea. It is explicitly allowed by Keep manifest format.
By the same logic, rather than blacklisting chars, we should probably escape ":", space, and all non-printable chars (something like
[:\000- ]). That would take care of \r, nul, etc.
#18 Updated by Lucas Di Pentima 7 months ago
Updates at 8b2eb7d1f
Test run: https://ci.curoverse.com/job/developer-run-tests/1033/
Fixes literal backslash escaping, enhances other special chars escaping & updates test.
#20 Updated by Lucas Di Pentima 7 months ago
Updates at a974fc22e
Test run: https://ci.curoverse.com/job/developer-run-tests/1034/
Sorry Tom, I've now updated
escape() to cover all the special chars between \000 and \040, and also ':'. Updated test too.
#21 Updated by Lucas Di Pentima 7 months ago
Removed useless comments and simplified the regex on c5e7cbef5
Test run: https://ci.curoverse.com/job/developer-run-tests/1035/
#25 Updated by Peter Amstutz 7 months ago
crunch-job (crunchv1) was broken by these changes because manifests that previously looked like
Now have ":" quoted as \072
As a result, it wouldn't match crunch-jobs's regular expression for picking out the image id.
Here's a quick and dirty crunch-job fix:
14539-fix-crunch-job @ ade82492825d6b78e2da35822e80f79a03e6ea67