Bug #14884

arv-put doesn't handle non-ASCII filenames correctly

Added by Tom Morris 10 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
02/27/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Attempting to upload a directory containing a file with non-ASCII characters in it's name causes arv-put to fail:

$ touch "foö.txt" 
$ ls
foö.txt
$ arv-put --version
/usr/bin/arv-put 1.2.1.20181130020805
$ arv-put .
2019-02-25 16:15:05 arvados.arv_put[99848] INFO: Calculating upload size, this could take some time...
2019-02-25 16:15:05 arvados.arv_put[99848] INFO: Creating new cache file at /home/tfmorris/.cache/arvados/arv-put/1772ebabb1791856f3faef5c84d335e2
0 Traceback (most recent call last):
  File "/usr/bin/arv-put", line 7, in <module>
    main()
  File "/usr/lib/python2.7/dist-packages/arvados/commands/put.py", line 1158, in main
    writer.start(save_collection=not(args.stream or args.raw))
  File "/usr/lib/python2.7/dist-packages/arvados/commands/put.py", line 601, in start
    self._local_collection.manifest_text()
  File "/usr/lib/python2.7/dist-packages/arvados/arvfile.py", line 273, in synchronized_wrapper
    return orig_func(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/arvados/collection.py", line 992, in manifest_text
    only_committed=only_committed)
  File "/usr/lib/python2.7/dist-packages/arvados/arvfile.py", line 273, in synchronized_wrapper
    return orig_func(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/arvados/collection.py", line 1038, in _get_manifest_text
    buf.append(self[dirname].manifest_text(stream_name=os.path.join(stream_name, dirname), strip=strip, normalize=True, only_committed=only_committed))
  File "/usr/lib/python2.7/dist-packages/arvados/arvfile.py", line 273, in synchronized_wrapper
    return orig_func(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/arvados/collection.py", line 992, in manifest_text
    only_committed=only_committed)
  File "/usr/lib/python2.7/dist-packages/arvados/arvfile.py", line 273, in synchronized_wrapper
    return orig_func(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/arvados/collection.py", line 1036, in _get_manifest_text
    buf.append(" ".join(normalize_stream(stream_name, stream)) + "\n")
  File "/usr/lib/python2.7/dist-packages/arvados/_normalize_stream.py", line 58, in normalize_stream
    stream_tokens.append("0:0:{0}".format(fout))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 2: ordinal not in range(128)


Subtasks

Task #14890: Review 14884-unicode-filenames-in-manifestIn ProgressTom Morris


Related issues

Related to Arvados - Story #4551: [SDKs] [Workbench] [API] Support UTF-8 filenames and stream names in all manifest-handling code.New

Associated revisions

Revision f87ca89c (diff)
Added by Tom Morris 10 months ago

14884: Allow non-ASCII filenames in manifests. Fixes #14884

Arvados-DCO-1.1-Signed-off-by: Tom Morris &lt;&gt;

Revision 023858a6
Added by Tom Morris 10 months ago

Merge branch '14884-unicode-name-support'

Fixes #14884.

Arvados-DCO-1.1-Signed-off-by: Tom Morris <>

History

#1 Updated by Tom Morris 10 months ago

  • Target version changed from Arvados Future Sprints to 2019-03-13 Sprint

#3 Updated by Peter Amstutz 10 months ago

  • Related to Story #4551: [SDKs] [Workbench] [API] Support UTF-8 filenames and stream names in all manifest-handling code. added

#4 Updated by Tom Morris 10 months ago

  • Assigned To set to Tom Morris
  • Target version changed from 2019-03-13 Sprint to 2019-02-27 Sprint

#5 Updated by Tom Morris 10 months ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

#6 Updated by Tom Morris 10 months ago

  • Status changed from Resolved to In Progress

#7 Updated by Tom Morris 10 months ago

  • Target version changed from 2019-02-27 Sprint to 2019-03-13 Sprint

#8 Updated by Lucas Di Pentima 10 months ago

Manual testing worked correctly. I've added also an arv-put integration test at ff690e65d .

The following lines may also need updating:

File sdk/python/arvados/commands/put.py lines 505, 745 & 760.

With that, it LGTM.

#9 Updated by Tom Morris 10 months ago

Thanks.

The following lines may also need updating:

File sdk/python/arvados/commands/put.py lines 505, 745 & 760.

I fixed those format statements as well as a few others that referenced potentially problematic strings like collection names or cache directory (since it has the user's home directory which could be Unicode).

I rebased and squashed that with the other non-manifest changes and also removed the problematic "Fixed #14884" from one of my previous commit messages.

I pushed a new branch at 14884-unicode-name-support if you want to take a quick look before I merge. The tests pass.

#10 Updated by Lucas Di Pentima 10 months ago

This lgtm, thanks.

#11 Updated by Tom Morris 10 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

#12 Updated by Tom Morris 10 months ago

  • Release set to 15

Also available in: Atom PDF