Story #9706

[Crunch2] [Deployment] Write a systemd service definition for crunch-dispatch-slurm

Added by Brett Smith over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Target version:
Start date:
08/03/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

The service definition should start crunch-dispatch-slurm without any additional configuration, and should restart it except in cases of very fatal errors (the exact definition of that TBD). The expectation is that administrators will configure the service by writing the /etc/arvados/crunch-dispatch-slurm/config.json configuration file.

  • Add the .service file to services/crunch-dispatch-slurm.
  • Update the crunch-dispatch-slurm distro packages to install that file to /lib/systemd/system/.
  • Update Crunch2 installation ("if using systemd, don't bother with this runit script")

Subtasks

Task #9730: Update packaging scriptsResolvedTom Clegg

Task #9719: Review 9706-package-systemd-filesResolvedTom Clegg

Task #9775: Merge 9705-crunch2-install-guide-wip after merging the #9706 branchResolvedTom Clegg

Associated revisions

Revision e18a2167
Added by Tom Clegg over 4 years ago

Merge branch '9706-package-systemd-files'

refs #9706
refs #9745

Revision cbba74fc
Added by Tom Clegg over 4 years ago

Merge branch '9705-crunch2-install-guide-wip'

closes #9705
closes #9706

History

#1 Updated by Radhika Chippada over 4 years ago

  • Assigned To set to Tom Clegg
  • Target version set to 2016-08-17 sprint
  • Story points set to 0.5

#2 Updated by Tom Clegg over 4 years ago

WIP: 9706-package-systemd-files @ 902e38e84. Assuming it works (haven't found out yet) it should notice when services/crunch-dispatch-slurm/crunch-dispatch-slurm.service appears in the source tree, and package it as /lib/systemd/system. Sound right?

#3 Updated by Brett Smith over 4 years ago

Tom Clegg wrote:

WIP: 9706-package-systemd-files @ 902e38e84. Assuming it works (haven't found out yet) it should notice when services/crunch-dispatch-slurm/crunch-dispatch-slurm.service appears in the source tree, and package it as /lib/systemd/system. Sound right?

Yes, that's good. If it works (as in, ends up invoking fpm the way we intend) it functionally looks good to me.

I'm realizing we have a small integration oversight: if we want our packages to work "out of the box" the same way as other distro packages, then our .deb packages will need to include a small postinst script that runs something like systemctl enable "$prog" && systemctl start "$prog". That will configure the service to run at boot (following the instructions in the [Install] section of the unit file), and then start it. A postinst like this would follow the usual Debian packaging conventions. RHEL/CentOS leaves this for administrators to do manually, so we should not do the same there.

If you want to declare this out of scope, since we didn't discuss it before grooming, you can create a separate story for it. Either way, hopefully it's just another small hookup (although I realize those eventually lead you to death-by-1000-cuts territory).

#4 Updated by Tom Clegg over 4 years ago

Added a generic postinst in 26e2faf. Still untested, just floating an approach.

I assume we also want to script "stop after uninstall", and (if not already addressed by "start" and "stop") "restart after upgrade/downgrade". Haven't addressed that yet.

Although crunch-dispatch-slurm behaves reasonably when there is no config file, I don't think it can survive without ARVADOS_API_HOST and ARVADOS_API_TOKEN. Not sure how this will affect the "auto start on install" strategy. I suppose if we ignore that issue, initial installation will result in "tried to start, but failed" and (after the admin sets up env vars) subsequent uninstall/reinstall/upgrade operations will get automatic stop/start/restart... which sounds like a big improvement.

#5 Updated by Tom Clegg over 4 years ago

  • Description updated (diff)

#6 Updated by Brett Smith over 4 years ago

Tom Clegg wrote:

Added a generic postinst in 26e2faf. Still untested, just floating an approach.

I pushed 9706-package-systemd-files-bcs at 94f1d09 with some suggested changes.

  • The postinst can't be bash unless the package Depends: on bash. Personally I'm happy enough to write plain POSIX shell, so I switched to sh.
  • Before we go running systemctl commands, we want to be sure that systemd is not merely installed, but is PID 1. [ -e /run/systemd/system ] is the test I've seen recommended for this, and it's what our Rails postinsts use, so I switched to that.
  • I extended your code a bit to ask systemd what the state of the unit file is, and then make changes based on that. This avoids configuration conflicts like trying to enable a service that's already enabled; trying to enable a service that has been masked by the sysadmin; etc.

I assume we also want to script "stop after uninstall"

Yes. We can do that in a similar prerm (so the unit file is still around) when [ "$1" = remove ].

and (if not already addressed by "start" and "stop") "restart after upgrade/downgrade". Haven't addressed that yet.

My version takes care of this part.

Although crunch-dispatch-slurm behaves reasonably when there is no config file, I don't think it can survive without ARVADOS_API_HOST and ARVADOS_API_TOKEN. Not sure how this will affect the "auto start on install" strategy. I suppose if we ignore that issue, initial installation will result in "tried to start, but failed" and (after the admin sets up env vars) subsequent uninstall/reinstall/upgrade operations will get automatic stop/start/restart... which sounds like a big improvement.

I agree with all of this.

#7 Updated by Tom Clegg over 4 years ago

9706-package-systemd-files @ cfabe41 includes
  • Brett's improvements from 9706-package-systemd-files-bcs (see above)
  • Stop + disable when uninstalling
  • Fix bug in crunch-dispatch-slurm config file parsing (refs #9745)
  • If running under systemd, notify systemd when dispatch is ready (otherwise, systemd can't tell the difference between "process has started" and "service is initialized, configured, etc.")

#8 Updated by Tom Clegg over 4 years ago

  • Status changed from New to In Progress

#9 Updated by Tom Clegg over 4 years ago

  • Target version changed from 2016-08-17 sprint to 2016-08-31 sprint

#10 Updated by Javier Bértoli over 4 years ago

It all seems OK to me. The only thing I'd suggest, is that you use curly braces around ALL variables in shell scripts, as that's the best practice.
In many places (ie, build/go-package-scripts/prerm and particularly build/run-library.sh) you have mixed styles, like:

systemd_unit="${pkg}.service" 

case "${1}" in

and
            systemctl stop "$systemd_unit" || true
            systemctl disable "$systemd_unit" || true

Although the code will work, the style is not consistent.

#11 Updated by Tom Clegg over 4 years ago

387387d → use ${...} consistently

#12 Updated by Javier Bértoli over 4 years ago

387387d ready to merge

#13 Updated by Tom Clegg over 4 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:cbba74fcd57b7b81337d44c2e663ba317e6538de.

Also available in: Atom PDF