Feature #21678
closedinstaller.sh option to run diagnostics on one of the internal instances that it installed instead of only locally
Added by Peter Amstutz 7 months ago. Updated 2 months ago.
Description
Should use the shell node since it already has all the right packages.
Updated by Peter Amstutz 6 months ago
- Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Updated by Lucas Di Pentima 6 months ago
- Assigned To changed from Peter Amstutz to Lucas Di Pentima
- Status changed from New to In Progress
Updated by Lucas Di Pentima 6 months ago
21678-installer-diagnostics-internal
@ 887751a
- All agreed upon points are implemented / addressed.
- Yes
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- Brett asked to have this change be exercised by our
test-provision-X
pipelines, but those pipelines only test the provision.sh script, not the installer. We should add a new story so that we can test the installer script both in single and multi node modes.
- Brett asked to have this change be exercised by our
- Code is tested and passing, both automated and manual, what manual testing was done is described
- Tested it manually by deploying a multi node test cluster in our Sandbox account and run the diag tests on its shell node.
- Documentation has been updated.
- Yes.
- Behaves appropriately at the intended scale (describe intended scale).
- Yes. The intended scale is just to run the diagnostics on just 1 internal node, so if the cluster has multiple shell nodes, it'll just use the first one on the list.
- Considered backwards and forwards compatibility issues between client and server.
- Yes: for backwards compatibility, instead of modifying the current
diagnostics
sub-command, I've added a new onediagnostics-internal
(not happy with its name, though)
- Yes: for backwards compatibility, instead of modifying the current
- Follows our coding standards and GUI style guidelines.
- Yes
This branch adds a diagnostics-internal
(alternate name ideas are welcome) sub-command to installer.sh
. When used, it will first check if there's at least one node defined with the shell
role, and use the first one of the list to run arvados-client diagnostics -internal-client
remotely. The diagnostics gets run with sudo
to ensure the process is able to use docker (currently, the admin
user isn't part of the docker
group and I don't think we need to fiddle with that)
Updated by Lucas Di Pentima 6 months ago
Ticket for updating test-provision jobs is https://dev.arvados.org/issues/19232
Updated by Peter Amstutz 6 months ago
- Target version changed from Development 2024-05-08 sprint to Development 2024-05-22 sprint
Updated by Brett Smith 6 months ago
Lucas Di Pentima wrote in #note-4:
21678-installer-diagnostics-internal
@ 887751a
This mostly looks good. I pushed a commit with small copyedits to the new documentation to try to clarify it a little. If you don't like it, you're welcome to delete and re-push, this just seemed easier than trying to write out the "word diff" in a comment. Or if you just want to copyedit my copyedits, that's fine too.
I just have a couple of questions about the command that runs the diagnostics itself:
$SSH $DEPLOY_USER@$TESTNODE "sudo ARVADOS_API_HOST=${DOMAIN}:${CONTROLLER_EXT_SSL_PORT} ARVADOS_API_TOKEN=$SYSTEM_ROOT_TOKEN arvados-client diagnostics -internal-client"
- Why does this use sudo? I don't think
arvados-client
needs privileges beyond having the system root token, does it? - The way this is written means that the system root token will leak in the system's process list when
sudo
is first invoked. This is particularly a problem since we're running the command on the shell node, where other users might have accounts and be able to use this to escalate privileges. Can we use an implementation that doesn't put the token in any command's argument list?
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- Brett asked to have this change be exercised by our
test-provision-X
pipelines, but those pipelines only test the provision.sh script, not the installer.
Yeah I think I had both tickets mixed up in my head when I made this comment, that's totally fine. Sorry for my confusion.
Thanks.
Updated by Lucas Di Pentima 6 months ago
Brett Smith wrote in #note-7:
This mostly looks good. I pushed a commit with small copyedits to the new documentation to try to clarify it a little. If you don't like it, you're welcome to delete and re-push, this just seemed easier than trying to write out the "word diff" in a comment. Or if you just want to copyedit my copyedits, that's fine too.
Thanks, they look great to me.
- Why does this use sudo? I don't think
arvados-client
needs privileges beyond having the system root token, does it?
I didn't wanted to fiddle with the shell's user group membership to make it able to use docker
. The default diagnostics behavior builds a docker image to use for the tests.
- The way this is written means that the system root token will leak in the system's process list when
sudo
is first invoked. This is particularly a problem since we're running the command on the shell node, where other users might have accounts and be able to use this to escalate privileges. Can we use an implementation that doesn't put the token in any command's argument list?
Good catch, I'll use sudo
's -E
to forward the env vars from the original user's session.
Updated by Lucas Di Pentima 6 months ago
Updates at ef4f350
Instead of passing credentials through environment variables (because doing that properly would require to reconfigure the remote SSH daemon to accept them), we now set up a ~/.config/arvados/settings.conf
file in the remote host's root
account previous to executing arvados-client
.
This not only removes the token leakage on the remote host's process list, but also minimize it on the local host's process list, as it would only be visible while writing it to the temp config file.
Updated by Brett Smith 6 months ago
Lucas Di Pentima wrote in #note-9:
Updates at ef4f350
I didn't wanted to fiddle with the shell's user group membership to make it able to use docker.
I don't love this either, since running the whole thing as root is an awfully big hammer just to get membership to the docker group. But since the docker group is root-equivalent anyway, I can live with it.
Instead of passing credentials through environment variables (because doing that properly would require to reconfigure the remote SSH daemon to accept them), we now set up a
~/.config/arvados/settings.conf
file in the remote host'sroot
account previous to executingarvados-client
.
This solves the leaking credentials problem, but now we've got a configuration management problem. This overwrites the user's arvados/settings.conf
on the shell node, which they might not be expecting. And then it leaves the system root token there, which is not ideal from a security perspective either, especially since the user might not realize it got written there.
What if we just piped a script to bash like this?
$SSH $DEPLOY_USER@$TESTNODE bash <<EOF
export ARVADOS_API_HOST="${DOMAIN}:${CONTROLLER_EXT_SSL_PORT}"
export ARVADOS_API_TOKEN="$SYSTEM_ROOT_TOKEN"
sudo -E arvados-client diagnostics -internal-client
EOF
- The here-doc gets evaluated entirely inside the shell, so none of this leaks into that process list on the local system.
export
is a shell built-in, so those variables don't leak into the process list on the shell node either.- This avoids writing any files and is hopefully simpler, assuming it works.
Updated by Lucas Di Pentima 6 months ago
Brett Smith wrote in #note-10:
What if we just piped a script to bash like this?
[...]
- The here-doc gets evaluated entirely inside the shell, so none of this leaks into that process list on the local system.
export
is a shell built-in, so those variables don't leak into the process list on the shell node either.- This avoids writing any files and is hopefully simpler, assuming it works.
That's a great solution, wasn't aware of that here-doc behavior. Thanks!
Here's the updated version: c0c47d7650 -- I've also explicitly listed which env vars should sudo
preserve, to make it less nasty.
Updated by Brett Smith 6 months ago
Lucas Di Pentima wrote in #note-11:
Here's the updated version: c0c47d7650 -- I've also explicitly listed which env vars should
sudo
preserve, to make it less nasty.
That's a nice change, thank you. As long as you've got this version working on a test cluster then this looks good to me.
Updated by Lucas Di Pentima 6 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|f11933fe893204fc0378d83b25167eb14ba4a265.