Bug #12183

[crunch-run] Handle symlinks with absolute paths into output directory

Added by Peter Amstutz 4 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
09/28/2017
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

This is suspicious:

2017-08-28T11:09:17.982121623Z CMD: ln -s /var/spool/cwl/STAR-Fusion_outdir/star-fusion.preliminary/star-fusion.filter.intermediates_dir/star-fusion.filtered /var/spool/cwl/STAR-Fusion_outdir/star-fusion.preliminary/star-fusion.fusion_candidates.preliminary.filtered

This seems to be creating a symlink to an absolute path inside the container. However, crunch-run (which collects the outputs) executes outside the container, which means it cannot dereference symlinks to arbitrary paths inside the container. It is already able to handle symlinks to mounted input files, and relative symlinks within the output directory, but doesn't correctly handle this case of a symlink with an absolute path to another file in the output directory. This should be handled correctly.

Currently it looks like putting a symlink foo->/etc/shadow (or ->../../../../../../etc/shadow) will cause crunch-run to store /etc/shadow from the compute node, not the container. This seems bad. Also, it looks like we follow symlinks to files, but not symlinks to dirs, which seems like a confusing rule.


Subtasks

Task #12205: Review 12183-crunch-run-symlinksResolvedPeter Amstutz

Task #12312: FixResolvedPeter Amstutz


Related issues

Related to Arvados - Bug #12273: [crunch] Should not try to upload special filesResolved2017-09-28

Related to Arvados - Bug #12606: Symlink in output points to invalid location -- no such file or directoryNew2017-11-17

Associated revisions

Revision 295156a9
Added by Peter Amstutz about 1 month ago

Merge branch '12183-crunch-run-symlinks' closes #12183

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Peter Amstutz 4 months ago

  • Description updated (diff)

#2 Updated by Tom Morris 4 months ago

  • Target version set to Arvados Future Sprints

#3 Updated by Tom Morris 4 months ago

  • Target version changed from Arvados Future Sprints to 2017-09-13 Sprint

#4 Updated by Peter Amstutz 3 months ago

  • Assigned To set to Peter Amstutz

#5 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2017-09-13 Sprint to 2017-09-27 Sprint

#6 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2017-09-27 Sprint to 2017-10-11 Sprint

#7 Updated by Peter Amstutz 3 months ago

  • Description updated (diff)

#8 Updated by Peter Amstutz 2 months ago

12183-crunch-run-symlinks @ 3afc0c377d2859d0aba622d1883af771b1b62594

#9 Updated by Tom Clegg 2 months ago

In EvalSymlinks() "var tgt string" can move down a bit, to just before first use

Is "for _, ent := range ReadDir(path) { Walk(path+ent) }" just a verbose form of "Walk(targetPath)" (or "Walk(path+"/.")") or is there some subtle difference?

"targetInfo.Mode()&os.ModeDir != 0" → targetInfo.IsDir()

I think it would be worthwhile to add some unit tests for the various error cases. Skip the whole "FullRun" thing but just set up enough stuff like runner.HostOutputDir that you can call EvalSymlinks() on some tricky paths and ensure they return errors.

It looks like a symlink like "../../.." will report a confusing error "symlink points to invalid location /some/host/path". Would it be better to report the symlink target ("../../..") instead of the (host) path it resolves to?

Will this behave well with stuff like file1 -> file2 -> file3 where all three files are in the output dir?

Should check errors returned by os.Remove() and os.Symlink()

#10 Updated by Peter Amstutz 2 months ago

Tom Clegg wrote:

In EvalSymlinks() "var tgt string" can move down a bit, to just before first use

Fixed.

Is "for _, ent := range ReadDir(path) { Walk(path+ent) }" just a verbose form of "Walk(targetPath)" (or "Walk(path+"/.")") or is there some subtle difference?

Walk(path) goes into a infinite recursion.

Walk(targetPath) is inappropriate because it determines the collection path by removing the host output directory prefix to get a relative path. Walking on "targetPath" means files are uploaded relative to "targetPath" and not "path".

Walk(path+"/.") is clever and seems to work. Fixed.

"targetInfo.Mode()&os.ModeDir != 0" → targetInfo.IsDir()

Fixed.

I think it would be worthwhile to add some unit tests for the various error cases. Skip the whole "FullRun" thing but just set up enough stuff like runner.HostOutputDir that you can call EvalSymlinks() on some tricky paths and ensure they return errors.

Added TestEvalSymlinks(). There is an existing TestOutputError() which checks for a symlink going outside the output directory.

It looks like a symlink like "../../.." will report a confusing error "symlink points to invalid location /some/host/path". Would it be better to report the symlink target ("../../..") instead of the (host) path it resolves to?

Sure. Fixed.

Will this behave well with stuff like file1 -> file2 -> file3 where all three files are in the output dir?

Yes, TestOutputSymlinkToOutput() tests this already.

Should check errors returned by os.Remove() and os.Symlink()

Fixed. I originally thought it didn't matter if symlinks were successfully removed or not, but on further thought, if they somehow get left behind and are pointing outside the output dir, it could upload files it shouldn't. So now inability to delete any blacklisted symlinks is an error.

12183-crunch-run-symlinks @ 80ce9f9d06128195a1f8506fc5f32b6fb589cfcc

#11 Updated by Tom Clegg 2 months ago

This error should probably mention the path of the symlink that failed:

err = fmt.Errorf("Too many symlinks.")

It looks like we don't have a test for "too many symlinks" or "symlink target doesn't exist".

I'm finding EvalSymlinks hard to follow and I suspect it's more complicated than it deserves to be.
  • var links []string has all the symlinks we followed on the way to the current tgt
  • a "bind" is a mount point relative to the host
  • at "range binds", tgt might be a symlink, or might not exist
  • in "range binds", if tgt is in a mount, we call getCollectionManifestForPath, and I guess we'll error out if this mount isn't a collection mount (e.g., it's a tmp volume) -- is this intentionally disallowed?
  • For each of the symlinks we encountered on the way here, we add some manifestText for each of the symlinks we followed -- which can only ever be one, because...
  • "If target is not [in] a mount, it must be within the output directory" but the output directory has to be within a mount, so isn't the Remove/Symlink/too-many-symlinks case unreachable?

I think the "range links" part is what's confusing me the most. The function comment (like my intuition) suggests we should either add exactly one file to manifestText (using the original symlink path) or return an error.

It also seems surprising to me that we need to remove/change symlinks in order to accomplish this stuff. This sort of side effect means we need to test things like "do b->a->c and a->b->c produce the same results if we process a before b?".

Wouldn't it work to do something simpler like this?
  • Rename EvalSymlinks to WriteSymlinkTarget
  • Readlink; return error if target is not in a mount; repeat until target is not a symlink
    • (It's too hard to implement symlinks that are in the container image but not in a mount, right?)
  • Return error if target is not in a collection mount (i.e., getCollectionManifestForPath returns error)
  • Append to manifestText, using name of original link + data from last target
  • Tell WriteTree() to skip symlinks, and handle all the symlinks after WriteTree() is done, so we can do the cheap copy even if the symlink target is in the output dir
  • No need to remove/rename symlinks

#12 Updated by Peter Amstutz 2 months ago

Tom Clegg wrote:

This error should probably mention the path of the symlink that failed:

[...]

It looks like we don't have a test for "too many symlinks" or "symlink target doesn't exist".

There is a test for circular references (which is what the "too many symlinks" behavior is really trying to protect against):

https://dev.arvados.org/projects/arvados/repository/revisions/941502322ccea097e1944972d3647182cd1b6268/entry/services/crunch-run/crunchrun_test.go#L1695

You're right, there isn't a simple test for non-existing target.

I'm finding EvalSymlinks hard to follow and I suspect it's more complicated than it deserves to be.
  • var links []string has all the symlinks we followed on the way to the current tgt
  • a "bind" is a mount point relative to the host
  • at "range binds", tgt might be a symlink, or might not exist
  • in "range binds", if tgt is in a mount, we call getCollectionManifestForPath, and I guess we'll error out if this mount isn't a collection mount (e.g., it's a tmp volume) -- is this intentionally disallowed?
  • For each of the symlinks we encountered on the way here, we add some manifestText for each of the symlinks we followed -- which can only ever be one, because...
  • "If target is not [in] a mount, it must be within the output directory" but the output directory has to be within a mount, so isn't the Remove/Symlink/too-many-symlinks case unreachable?

I think the "range links" part is what's confusing me the most. The function comment (like my intuition) suggests we should either add exactly one file to manifestText (using the original symlink path) or return an error.

It also seems surprising to me that we need to remove/change symlinks in order to accomplish this stuff. This sort of side effect means we need to test things like "do b->a->c and a->b->c produce the same results if we process a before b?".

Yes, what happened there is that an earlier version deleted all the symlinks in the chain immediately. But then I realized that was order sensitive, so for p1→p2→p3 if you start at p2 you miss p1. However, with the deferred delete, the files get written twice, which also incorrect.

Wouldn't it work to do something simpler like this?
  • Rename EvalSymlinks to WriteSymlinkTarget
  • Readlink; return error if target is not in a mount; repeat until target is not a symlink
    • (It's too hard to implement symlinks that are in the container image but not in a mount, right?)
  • Return error if target is not in a collection mount (i.e., getCollectionManifestForPath returns error)
  • Append to manifestText, using name of original link + data from last target
  • Tell WriteTree() to skip symlinks, and handle all the symlinks after WriteTree() is done, so we can do the cheap copy even if the symlink target is in the output dir
  • No need to remove/rename symlinks

I think you're right, that does sound like a better approach.

#13 Updated by Peter Amstutz 2 months ago

12183-crunch-run-symlinks 305490369b502d47607b7ffe790d2c85e9a8db34

Major refactor of the approach. Now there is only one tree walk, which handles both uploading regular files and following symlinks.

Avoids updating symlinks on-disk. However, because they are not dereferenceable, requires keeping track of both the symlink target and original symlink path, the target is "relocated" for the purposes of constructing the output collection. This creates some additional complexity.

#14 Updated by Tom Clegg 2 months ago

Error with comment "tgt doesn't exist or lacks permissions" seems to be reported to user as "points to invalid location"

I think UploadOutputFile would be less confusing if the non-symlink cases weren't inside the loop. Something like

for followed := 0; info.Mode()&os.ModeSymlink != 0; followed++ {
  if followed >= 32 {
    error, too many symlinks
  }
  if not a symlink {
    break
  }
  tgt = Readlink
  check tgt is allowed
  info = Lstat
}
process target as regular file, dir, or manifest fragment from mounted collection

(This might even let you split it into two funcs, so you can test just the symlink resolution without bundling the upload, fwiw.)

With the current code I'm suspicious this will not behave as expected:

"mounts":{"/tmp":{"kind":"tmp"}},"output_path":"/tmp/output" 
/tmp/output/foo -> ../foo
/tmp/foo -> output/bar
/tmp/output/bar [regular file]

#15 Updated by Tom Clegg 2 months ago

I haven't had time to look closely enough but I'd like to make sure stuff like this works:

/tmp/output/bar -> foo
/tmp/output/foo [regular file]

...even if WalkFunc visits bar before visiting foo.

#16 Updated by Peter Amstutz 2 months ago

  • Status changed from New to In Progress
  • Target version changed from 2017-10-11 Sprint to 2017-10-25 Sprint

#17 Updated by Peter Amstutz about 2 months ago

Tom Clegg wrote:

Error with comment "tgt doesn't exist or lacks permissions" seems to be reported to user as "points to invalid location"

Tweaked the message.

I think UploadOutputFile would be less confusing if the non-symlink cases weren't inside the loop. Something like [...]\
(This might even let you split it into two funcs, so you can test just the symlink resolution without bundling the upload, fwiw.)

Done.

With the current code I'm suspicious this will not behave as expected: [...]

"mounts":{"/tmp":{"kind":"tmp"}},"output_path":"/tmp/output" 
/tmp/output/foo -> ../foo
/tmp/foo -> output/bar
/tmp/output/bar [regular file]

Currently, SetupMounts() requires that Container.OutputPath corresponds exactly to a mount point, not a subdirectory of a mount point. So you're right that your example could work but won't because it links outside the output directory and then back in, but you probably are not able to get into that situation either.

(Similarly you can't have a symlink to another tmp mount and back to the output; this is complicated enough already, if we want to handle all possible cases, we need to do the upload from inside the container.)

I haven't had time to look closely enough but I'd like to make sure stuff like this works:

/tmp/output/bar -> foo
/tmp/output/foo [regular file]

...even if WalkFunc visits bar before visiting foo.

TestOutputSymlinkToOutput() tests several variations of this.

Since each file is handled separately and it doesn't mess with the symlinks on disk, it doesn't matter what order they are visited.

Now 12183-crunch-run-symlinks e5f33b99e1c363e4d14f429075fdce15b9643000

Running tests https://ci.curoverse.com/job/developer-run-tests/481/

#18 Updated by Tom Clegg about 2 months ago

Better, thanks.

I think we need to look out for infinitely deep directory hierarchies, now that we're following symlinks to dirs. This gives OOM-panic instead of an error:

diff --git a/services/crunch-run/crunchrun_test.go b/services/crunch-run/crunchrun_test.go
index 098a64205..948e47936 100644
--- a/services/crunch-run/crunchrun_test.go
+++ b/services/crunch-run/crunchrun_test.go
@@ -1555,6 +1555,7 @@ func (s *TestSuite) TestOutputSymlinkToOutput(c *C) {
                os.Symlink("file2", t.realTemp+"/2/file4")
                os.Symlink("realdir", t.realTemp+"/2/dir1")
                os.Symlink("/tmp/realdir", t.realTemp+"/2/dir2")
+               os.Symlink(".", t.realTemp+"/2/realdir/loop")
                t.logWriter.Close()
        })

There are two checks for "symlink target must be in output dir", one in UploadOutputFile and one in derefOutputSymlink. I think the one in UploadOutputFile could be removed if the one in derefOutputSymlink returned an error instead of doing break...?

The "Lstat error" case in derefOutputSymlink should probably give the actual error message, and not mention the "invalid location" possibility, which would have been caught in the HostOutputDir condition.

#19 Updated by Peter Amstutz about 2 months ago

Tom Clegg wrote:

Better, thanks.

I think we need to look out for infinitely deep directory hierarchies, now that we're following symlinks to dirs. This gives OOM-panic instead of an error:

[...]

Fixed and added test.

There are two checks for "symlink target must be in output dir", one in UploadOutputFile and one in derefOutputSymlink. I think the one in UploadOutputFile could be removed if the one in derefOutputSymlink returned an error instead of doing break...?

Yes, but it a bit more subtle because we want to check the binds before deciding it is an invalid path.

Added an error constant that represents "symlinks points outside of output directory" and returned that.

The "Lstat error" case in derefOutputSymlink should probably give the actual error message, and not mention the "invalid location" possibility, which would have been caught in the HostOutputDir condition.

Done.

now @ ca3d9008da4a57e71cb7e6dfd1e25b5eb0c0de0b

#20 Updated by Tom Clegg about 2 months ago

NotInOutputDirError should be ErrNotInOutputDir (convention is FooError is a type, ErrFoo is an object)

Might as well make the arbitrary limit for "symlinks-to-dir along any path" be the same as the arbitrary limit for "chain of symlinks" (currently 8 and 16 respectively). Might be nice to say what the limit is in the error message, too.

There's a bit of a secret key to understanding the logic here, i.e., we only follow symlink chains until they leave the output dir, and at that point they are either in a collection mount (and therefore can't be a symlink, because collections don't support symlinks) or we throw an error (because we can't correctly interpret symlinks from paths that aren't in the output dir, even if they're in the container). Is that right? The UploadFile comment nearly says this ("they must remain within the output directory") but maybe it just needs to say "every symlink target (even targets that are symlinks themselves) must be in the output directory or a collection mount."

I'd say this restriction is not especially obvious so it should be documented somewhere users might see it, too. http://doc.arvados.org/api/methods/container_requests.html ?

#21 Updated by Peter Amstutz about 2 months ago

  • Target version changed from 2017-10-25 Sprint to 2017-11-08 Sprint

#22 Updated by Peter Amstutz about 1 month ago

Tom Clegg wrote:

NotInOutputDirError should be ErrNotInOutputDir (convention is FooError is a type, ErrFoo is an object)

Done.

Might as well make the arbitrary limit for "symlinks-to-dir along any path" be the same as the arbitrary limit for "chain of symlinks" (currently 8 and 16 respectively). Might be nice to say what the limit is in the error message, too.

Done.

There's a bit of a secret key to understanding the logic here, i.e., we only follow symlink chains until they leave the output dir, and at that point they are either in a collection mount (and therefore can't be a symlink, because collections don't support symlinks) or we throw an error (because we can't correctly interpret symlinks from paths that aren't in the output dir, even if they're in the container). Is that right? The UploadFile comment nearly says this ("they must remain within the output directory") but maybe it just needs to say "every symlink target (even targets that are symlinks themselves) must be in the output directory or a collection mount."

I'd say this restriction is not especially obvious so it should be documented somewhere users might see it, too. http://doc.arvados.org/api/methods/container_requests.html ?

Done.

12183-crunch-run-symlinks @ cd0e7f06116925d66cc370a03cef1b8c19d0002d

https://ci.curoverse.com/job/developer-run-tests/492/

#23 Updated by Tom Clegg about 1 month ago

Just one nit in the documentation part, and only because someone who bothers to read this kind of documentation deserves to get exactly what's advertised:

... every symlink target (even targets that are symlinks themselves) must either remain in the output directory or point to a collection mount.

The "point to" part seems to mean symlink chain A -> B -> C is allowed regardless of where B is, as long as C is in a collection mount or output dir. But in fact B's location is restricted too.

How about:

... every symlink target (even targets that are symlinks themselves) must be in the output directory or a collection mount.

the rest LGTM

#24 Updated by Anonymous about 1 month ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados|commit:295156a99251bf26d705272c329a64d58089bb9e.

Also available in: Atom PDF