Bug #17522

[arv-put] should use binary mode when reading stdin

Added by Tom Clegg 8 months ago. Updated 23 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Start date:
04/14/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

When the input filename is "-" or "/dev/stdin", arv-put appears to read/transcode the input as utf-8:

tom@shell:~$ head -c1000000 /dev/urandom | arv-put /dev/stdin
2021-04-09 16:54:34 arvados.arv_put[29927] INFO: No cache usage requested for this run.
Traceback (most recent call last):
  File "/usr/bin/arv-put", line 7, in <module>
    main()
  File "/usr/share/python3/dist/python3-arvados-python-client/lib/python3.7/site-packages/arvados/commands/put.py", line 1270, in main
    trash_at=trash_at)
  File "/usr/share/python3/dist/python3-arvados-python-client/lib/python3.7/site-packages/arvados/commands/put.py", line 508, in __init__
    self._build_upload_list()
  File "/usr/share/python3/dist/python3-arvados-python-client/lib/python3.7/site-packages/arvados/commands/put.py", line 526, in _build_upload_list
    self._write_stdin(self.filename or 'stdin')
  File "/usr/share/python3/dist/python3-arvados-python-client/lib/python3.7/site-packages/arvados/commands/put.py", line 746, in _write_stdin
    self._write(sys.stdin, output)
  File "/usr/share/python3/dist/python3-arvados-python-client/lib/python3.7/site-packages/arvados/commands/put.py", line 839, in _write
    data = source_fd.read(arvados.config.KEEP_BLOCK_SIZE)
  File "/usr/share/python3/dist/python3-arvados-python-client/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 1: invalid start byte

As a workaround you can make a symlink so arv-put doesn't realize it's reading from stdin.

tom@shell:~$ ln -s /dev/stdin stdin
tom@shell:~$ head -c1000000 /dev/urandom | arv-put ./stdin
2021-04-09 16:56:11 arvados.arv_put[31332] INFO: Creating new cache file at /home/tom/.cache/arvados/arv-put/30973df233a1b57881df8fc58ff569bd
1000000 2021-04-09 16:56:11 arvados.arv_put[31332] INFO: 

2021-04-09 16:56:11 arvados.arv_put[31332] INFO: Collection saved as 'Saved at 2021-04-09 16:56:11 UTC by tom@shell.2xpu4.arvadosapi.com'
2xpu4-4zz18-fovltoc3t3r3bu3

Subtasks

Task #17539: Review 17522-arvput-stdin-transcode-fixResolvedLucas Di Pentima


Related issues

Has duplicate Arvados - Bug #17765: [arv-put] assumes text only input when reading from stdinDuplicate

Associated revisions

Revision 64c90ad4
Added by Lucas Di Pentima 8 months ago

Merge branch '17522-arvput-stdin-transcode-fix'
Closes #17522

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <>

History

#1 Updated by Tom Clegg 8 months ago

  • Description updated (diff)

#2 Updated by Lucas Di Pentima 8 months ago

  • Assigned To set to Lucas Di Pentima

#3 Updated by Lucas Di Pentima 8 months ago

  • Status changed from New to In Progress

#4 Updated by Lucas Di Pentima 8 months ago

Updates at e6a8d36f7 - branch 17522-arvput-stdin-transcode-fix
Test run: https://ci.arvados.org/job/developer-run-tests/2411/

  • Adds test exposing the bug and fixes it.

#5 Updated by Nico C├ęsar 8 months ago

review @ e6a8d36f7bec7be8e89106d1281e0f863cf7529e

The fix and the test looks good to me, one think to be aware is that sys.stdin.buffer is for python3 as far as I know. I don't know if arv-put is being used in python2 environments anymore, so I think we're safe

Ready to merge.

#6 Updated by Lucas Di Pentima 8 months ago

Thanks! we don't support Python2 anymore as of 2.1: https://doc.arvados.org/v2.1/sdk/python/sdk-python.html

Merging!

#7 Updated by Anonymous 8 months ago

  • Status changed from In Progress to Resolved

#8 Updated by Tom Clegg 6 months ago

  • Has duplicate Bug #17765: [arv-put] assumes text only input when reading from stdin added

#9 Updated by Peter Amstutz 23 days ago

  • Release set to 41

Also available in: Atom PDF