Bug #14884
closedarv-put doesn't handle non-ASCII filenames correctly
Description
Attempting to upload a directory containing a file with non-ASCII characters in it's name causes arv-put to fail:
$ touch "foö.txt" $ ls foö.txt $ arv-put --version /usr/bin/arv-put 1.2.1.20181130020805 $ arv-put . 2019-02-25 16:15:05 arvados.arv_put[99848] INFO: Calculating upload size, this could take some time... 2019-02-25 16:15:05 arvados.arv_put[99848] INFO: Creating new cache file at /home/tfmorris/.cache/arvados/arv-put/1772ebabb1791856f3faef5c84d335e2 0 Traceback (most recent call last): File "/usr/bin/arv-put", line 7, in <module> main() File "/usr/lib/python2.7/dist-packages/arvados/commands/put.py", line 1158, in main writer.start(save_collection=not(args.stream or args.raw)) File "/usr/lib/python2.7/dist-packages/arvados/commands/put.py", line 601, in start self._local_collection.manifest_text() File "/usr/lib/python2.7/dist-packages/arvados/arvfile.py", line 273, in synchronized_wrapper return orig_func(self, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/arvados/collection.py", line 992, in manifest_text only_committed=only_committed) File "/usr/lib/python2.7/dist-packages/arvados/arvfile.py", line 273, in synchronized_wrapper return orig_func(self, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/arvados/collection.py", line 1038, in _get_manifest_text buf.append(self[dirname].manifest_text(stream_name=os.path.join(stream_name, dirname), strip=strip, normalize=True, only_committed=only_committed)) File "/usr/lib/python2.7/dist-packages/arvados/arvfile.py", line 273, in synchronized_wrapper return orig_func(self, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/arvados/collection.py", line 992, in manifest_text only_committed=only_committed) File "/usr/lib/python2.7/dist-packages/arvados/arvfile.py", line 273, in synchronized_wrapper return orig_func(self, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/arvados/collection.py", line 1036, in _get_manifest_text buf.append(" ".join(normalize_stream(stream_name, stream)) + "\n") File "/usr/lib/python2.7/dist-packages/arvados/_normalize_stream.py", line 58, in normalize_stream stream_tokens.append("0:0:{0}".format(fout)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 2: ordinal not in range(128)
Related issues
Updated by Tom Morris over 5 years ago
- Target version changed from Arvados Future Sprints to 2019-03-13 Sprint
Updated by Peter Amstutz over 5 years ago
- Related to Idea #4551: [Workbench] [API] Support UTF-8 filenames and stream names in all manifest-handling code. added
Updated by Tom Morris over 5 years ago
- Assigned To set to Tom Morris
- Target version changed from 2019-03-13 Sprint to 2019-02-27 Sprint
Updated by Tom Morris over 5 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|f87ca89cd9599b8429216f65782106b67aa107be.
Updated by Tom Morris over 5 years ago
- Status changed from Resolved to In Progress
Updated by Tom Morris over 5 years ago
- Target version changed from 2019-02-27 Sprint to 2019-03-13 Sprint
Updated by Lucas Di Pentima over 5 years ago
Manual testing worked correctly. I've added also an arv-put
integration test at ff690e65d .
The following lines may also need updating:
File sdk/python/arvados/commands/put.py
lines 505, 745 & 760.
With that, it LGTM.
Updated by Tom Morris over 5 years ago
Thanks.
The following lines may also need updating:
File
sdk/python/arvados/commands/put.py
lines 505, 745 & 760.
I fixed those format statements as well as a few others that referenced potentially problematic strings like collection names or cache directory (since it has the user's home directory which could be Unicode).
I rebased and squashed that with the other non-manifest changes and also removed the problematic "Fixed #14884" from one of my previous commit messages.
I pushed a new branch at 14884-unicode-name-support if you want to take a quick look before I merge. The tests pass.
Updated by Tom Morris over 5 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|023858a629bedfccdb5d17602643aaaa0a223e1a.