Bug #11579
closedarv-put doesn't traverse symlinked directories
Added by Lucas Di Pentima over 7 years ago. Updated over 7 years ago.
Description
arv-put
is silently ignoring directory symlinks while the previous version didn't.
- Allow symlink traversal by default, adding a flag to disable this behaviour.
- Add a check so that the command won't go into an infinite uploads.
- Add test coverage.
Updated by Lucas Di Pentima over 7 years ago
- Status changed from New to In Progress
Updated by Lucas Di Pentima over 7 years ago
Changes at: e289b9ad5 - branch 11579-arvput-follow-symlinks
Test run: https://ci.curoverse.com/job/developer-run-tests/261/
Reverted default behavior so that arv-put
follow symlinked directories.
Added flag --no-follow-links
to allow the user avoid this, if needed.
Symlinked dirs will only be traversed once per upload, warning the user when a directory is linked more than once.
Updated by Peter Amstutz over 7 years ago
While testing this, I tried to upload a non-existing directory, and the backtrace is pretty terrible, this needs to be cleaned up:
$ arv-put stuff 2017-04-28 10:48:30 arvados.arv_put[1041] INFO: Creating new cache file at /home/peter/.cache/arvados/arv-put/11f5dc3d482fcb99c82089cbb6dbe037 0 2017-04-28 10:48:30 arvados.arv_put[1041] WARNING: Abnormal termination: Traceback (most recent call last): File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/arvados_python_client-0.1.20170427201553-py2.7.egg/arvados/commands/put.py", line 470, in start self.filename or os.path.basename(path)) File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/arvados_python_client-0.1.20170427201553-py2.7.egg/arvados/commands/put.py", line 611, in _check_file 'mtime': os.path.getmtime(source), File "/home/peter/work/scripts/venv/lib/python2.7/genericpath.py", line 54, in getmtime return os.stat(filename).st_mtime OSError: [Errno 2] No such file or directory: '/home/peter/tmp/stuff/stuff' Traceback (most recent call last): File "/home/peter/work/scripts/venv/bin/arv-put", line 4, in <module> __import__('pkg_resources').run_script('arvados-python-client==0.1.20170427201553', 'arv-put') File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 738, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1499, in run_script exec(code, namespace, namespace) File "/home/peter/work/scripts/venv/lib/python2.7/site-packages/arvados_python_client-0.1.20170427201553-py2.7.egg/EGG-INFO/scripts/arv-put", line 4, in <module> main() File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/arvados_python_client-0.1.20170427201553-py2.7.egg/arvados/commands/put.py", line 992, in main writer.start(save_collection=not(args.stream or args.raw)) File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/arvados_python_client-0.1.20170427201553-py2.7.egg/arvados/commands/put.py", line 470, in start self.filename or os.path.basename(path)) File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/arvados_python_client-0.1.20170427201553-py2.7.egg/arvados/commands/put.py", line 611, in _check_file 'mtime': os.path.getmtime(source), File "/home/peter/work/scripts/venv/lib/python2.7/genericpath.py", line 54, in getmtime return os.stat(filename).st_mtime OSError: [Errno 2] No such file or directory: '/home/peter/tmp/stuff/stuff'
Updated by Peter Amstutz over 7 years ago
Here's my (evil) test case:
b1/ b1/b b2 -> b1/ b3/ b3/b3 b3/b -> ../b4/ b4/ b4/b -> ../b3/ b4/b4 b5 -> b2/ bar baz -> bar quux -> baz
A few edge cases:
--no-follow-links
still follows links to files (in the example "baz" and "quux" are uploaded).
It uploads both "b1" and the symlink-to-b1 "b2". However it refuses to upload "b5" which is a symlink-to-symlink.
I made a new directory:
b1 -> ../stuff/b1/ b5 -> ../stuff/b5/
This resulting in the error "Skipping '/home/peter/tmp/stuff2/b1' symlink to directory '/home/peter/tmp/stuff/b1' because it was already uploaded"
It only created "b5".
Then I tried
b0 -> ../stuff/b5/ b1 -> ../stuff/b1/
and got "Skipping '/home/peter/tmp/stuff2/b0' symlink to directory '/home/peter/tmp/stuff/b1' because it was already uploaded"
In this case it only created "b1".
The behavior seems sensitive to sort order.
I believe the correct behavior should be copy everything as if they were regular files and directories, except when symlinks create a infinite loop. When you detect that something is a symlink to something that has already been uploaded, you can make a copy within the Collection (that should go for both file and directories).
Updated by Peter Amstutz over 7 years ago
Suggest maybe you keep a mapping of local file path to its location in the collection, after calling realpath(), check if the file has already been uploaded and if so copy it within the collection.
You might need a slightly different strategy for detecting symlink cycles.
Updated by Lucas Di Pentima over 7 years ago
Updates at 4d012b23a
Test run: https://ci.curoverse.com/job/developer-run-tests/266/
As discussed on the chat, I focused on fixing the regression, that is, the fact that this new arv-put
didn't followed symlinked directories.
Fixed faulty behavior when asking to not follow symlinks, it is now ignoring both files and directories.
Also Peter asked me to enhance the error message when passing a non existant path, so now instead of printing a huge backtrace, it will log a proper message.
After this is merged, I'll continue adding the remaining features and fixes related to symlink recursion and duplication.
Updated by Peter Amstutz over 7 years ago
11579-arvput-follow-symlinks @ 4d012b23a4ac88f433986054fc0085ee6714b5b3
The "file not found" message is still printing a backtrace?
2017-05-02 09:46:36 arvados.arv_put[22714] INFO: Creating new cache file at /home/peter/.cache/arvados/arv-put/7ca31b0f6a86bfebd692b4b00d1192ed 0 2017-05-02 09:46:36 arvados.arv_put[22714] WARNING: Abnormal termination: Traceback (most recent call last): File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/arvados_python_client-0.1.20170501210502-py2.7.egg/arvados/commands/put.py", line 438, in start raise PathDoesNotExistError("file or directory '{}' does not exist.".format(path)) PathDoesNotExistError: file or directory 'stuff3' does not exist. 2017-05-02 09:46:36 arvados.arv_put[22714] ERROR: arv-put: file or directory 'stuff3' does not exist.
Rest LGTM
Updated by Lucas Di Pentima over 7 years ago
Updated code & merged master at fe45b1b66c730f2546d78a7899375707c0816518
Running tests before merging to master: https://ci.curoverse.com/job/developer-run-tests/267/
Updated by Lucas Di Pentima over 7 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:613155d587da60dbe04c7635649b1f3694938adc.