Bug #18289

[crunch] allow_other is not required when using Singularity

Added by Ward Vandewege about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Story points:
-
Release relationship:
Auto

Description

Our docs say that one should have user_allow_other in /etc/fuse.conf on compute nodes, but that is not needed when running Singularity.

  • update the documentation along those lines
  • crunch-run needs to only request the allow_other mount option when RuntimeEngine is set to docker.

History

#1 Updated by Ward Vandewege about 2 months ago

  • Description updated (diff)

#2 Updated by Ward Vandewege about 2 months ago

I ran a test job on 9tee4 after removing user_allow_other from /etc/fuse.conf on the compute nodes:

https://workbench.9tee4.arvadosapi.com/container_requests/9tee4-xvhdp-vv11wi7y4t7276m

It failed with

2021-10-21T19:00:33.509718889Z fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf
2021-10-21T19:00:33.512267359Z 2021-10-21 19:00:33 arvados.arv-mount[2698109] ERROR: arv-mount: exception during mount: fuse_mount failed
2021-10-21T19:00:33.512267359Z Traceback (most recent call last):
2021-10-21T19:00:33.512267359Z   File "/usr/share/python3/dist/python3-arvados-fuse/lib/python3.7/site-packages/arvados_fuse/command.py", line 386, in _run_standalone
2021-10-21T19:00:33.512267359Z     with self:
2021-10-21T19:00:33.512267359Z   File "/usr/share/python3/dist/python3-arvados-fuse/lib/python3.7/site-packages/arvados_fuse/command.py", line 141, in __enter__
2021-10-21T19:00:33.512267359Z     llfuse.init(self.operations, native_str(self.args.mountpoint), self._fuse_options())
2021-10-21T19:00:33.512267359Z   File "src/fuse_api.pxi", line 246, in llfuse.init
2021-10-21T19:00:33.512267359Z RuntimeError: fuse_mount failed
2021-10-21T19:00:32.592708107Z Not starting a gateway server (GatewayAuthSecret was not provided by dispatcher)
2021-10-21T19:00:32.592916306Z crunch-run 2.3.0~dev20211008165008 (go1.17.1) started
2021-10-21T19:00:32.592937800Z Executing container '9tee4-dz642-381zm4m9v1xxhjl' using singularity runtime
2021-10-21T19:00:32.592963683Z Executing on host 'compute1.9tee4.arvadosapi.com'
2021-10-21T19:00:32.707807725Z container token "v2/9tee4-gj3su-15pcdcsz9xj1tcg/6dl6cdean4nbqkxq1zawsjcu6annp4nz5dl5brzs1gylzk5ulo/9tee4-dz642-381zm4m9v1xxhjl" 
2021-10-21T19:00:32.708404569Z Running [arv-mount --foreground --allow-other --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-tmp tmp0 --mount-by-pdh by_id --mount-by-id by_uuid /tmp/crunch-run.9tee4-dz642-381zm4m9v1xxhjl.3577839884/keep1214158095]
2021-10-21T19:00:33.582210613Z Arv-mount exit error: exit status 1

#3 Updated by Ward Vandewege about 2 months ago

  • Assigned To set to Ward Vandewege
  • Status changed from New to In Progress

#4 Updated by Ward Vandewege about 2 months ago

  • Release set to 42

#5 Updated by Ward Vandewege about 2 months ago

  • Description updated (diff)

#6 Updated by Ward Vandewege about 2 months ago

9f6f07fe6790e7c3a8f1b57990c16447c9d2713f on branch 18289-only-pass-allow-other-when-running-docker

I've tested that this works with a custom crunch-run binary on 9tee4, where I removed allow_other from /etc/fuse.conf:

without patch: https://workbench.9tee4.arvadosapi.com/container_requests/9tee4-xvhdp-quwtm8w4oia0cu5
with patch: https://workbench.9tee4.arvadosapi.com/container_requests/9tee4-xvhdp-i26ori7gjxhr6hx

Tests passed at https://ci.arvados.org/view/Developer/job/developer-run-tests/2738/

#7 Updated by Tom Clegg about 1 month ago

LGTM, thanks!

#8 Updated by Ward Vandewege about 1 month ago

Documentation fixes pushed at 01698bea4703ce073425e2080c7cad83e2f873cc on branch 18289-only-pass-allow-other-when-running-docker

#9 Updated by Tom Clegg about 1 month ago

Looks like these link targets are reversed in doc/install/crunch2-slurm/install-test.html.textile.liquid

Make sure all of your compute nodes are set up with "Docker":../crunch2/install-compute-node-singularity.html or "Singularity":../crunch2/install-compute-node-docker.html.

Errant ' char in doc/install/crunch2/install-compute-node-singularity.html.textile.liquid

# "Install'python-arvados-fuse and crunch-run":#install-packages

Also in doc/install/crunch2/install-compute-node-singularity.html.textile.liquid there is a link to "Set up a Slurm compute node with Docker":install-compute-node-docker.html -- that title should not include "Slurm" any more. The "introduction" section also shouldn't say Slurm.

It seems a bit odd to divide the fairly small amount of singularity instructions into doc/install/singularity and doc/install/crunch2/install-compute-node-singularity -- could we move the doc/install/singularity information into the new install-compute-node page instead of linking to it?

#10 Updated by Ward Vandewege about 1 month ago

Tom Clegg wrote:

Looks like these link targets are reversed in doc/install/crunch2-slurm/install-test.html.textile.liquid
Make sure all of your compute nodes are set up with "Docker":../crunch2/install-compute-node-singularity.html or "Singularity":../crunch2/install-compute-node-docker.html.

Errant ' char in doc/install/crunch2/install-compute-node-singularity.html.textile.liquid

[...]

Also in doc/install/crunch2/install-compute-node-singularity.html.textile.liquid there is a link to "Set up a Slurm compute node with Docker":install-compute-node-docker.html -- that title should not include "Slurm" any more. The "introduction" section also shouldn't say Slurm.

Thanks, I fixed all that.

It seems a bit odd to divide the fairly small amount of singularity instructions into doc/install/singularity and doc/install/crunch2/install-compute-node-singularity -- could we move the doc/install/singularity information into the new install-compute-node page instead of linking to it?

Yes, good idea, I've made that change.

Latest in c5e4fb5838d2f447ae126159a71340b90cfea33c on branch 18289-only-pass-allow-other-when-running-docker

#11 Updated by Tom Clegg about 1 month ago

LGTM

#12 Updated by Ward Vandewege about 1 month ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved

Applied in changeset arvados-private:commit:arvados|2582dc22a24ee7cdaf1a68c6b4b1c639f88c2efe.

Also available in: Atom PDF