Project

General

Profile

Actions

Bug #6157

closed

[Documentation] Explain extra steps needed when compute hostnames are not fooN

Added by Tom Clegg almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Documentation
Target version:
Story points:
0.5

Description

background

Changing slurm config files, and keeping them synchronized across controller+workers, is a bit painful and can cause race conditions that are annoying to diagnose, so we try to avoid setups where it has to change during normal operation.

"fooN", where N is decimal, lets you write foo[0-199] or foo[000-199] in your slurm config files. Therefore, nodes.ping makes it easy to manage a setup like this. In the API server configuration, you can set assign_node_hostname to a corresponding format string to so that nodes that ping without a hostname get one set matching the schema, and max_compute_nodes to make sure it doesn't go over your allocation.

However, in some setups it might be inconvenient/difficult/impossible to use hostnames like "fooN".

improvement

Install docs should include a section explaining
  • Why foo[0-N] is a good idea (see above)
  • What to do differently if you use a naming scheme besides string+decimal (e.g., your worker nodes' hostnames are {alice, bob, clay, ...})

We should make the simplifying assumption that the hostnames are assigned manually/OOB, and known in advance. IOW, instead of covering scenarios where slurm config has to change every time a new compute node is turned up, we should just advise against that.

AFAIK, as long as the available/powered-on nodes' hostnames are a subset of the hostnames given in slurm.conf, and no two hosts have the same name, slurm and Arvados should work without any code changes.


Files

Screenshot from 2015-07-29 16_36_50.png (167 KB) Screenshot from 2015-07-29 16_36_50.png work in progress Tom Clegg, 07/29/2015 08:37 PM

Subtasks 2 (0 open2 closed)

Task #6744: Explain this decision at install-crunch-dispatch.htmlResolvedTom Clegg07/31/2015Actions
Task #6524: Review 6157-worker-hostnamesResolvedWard Vandewege08/05/2015Actions

Related issues

Related to Arvados - Bug #6156: [API] Node record accommodates setups where hostnames are set statically by the sysadminResolvedRadhika Chippada06/17/2015Actions
Actions #1

Updated by Tom Clegg almost 9 years ago

  • Project changed from 35 to Arvados
  • Description updated (diff)
  • Category set to Documentation
Actions #2

Updated by Brett Smith almost 9 years ago

  • Target version changed from Bug Triage to 2015-07-22 sprint
Actions #3

Updated by Tom Clegg almost 9 years ago

  • Assigned To set to Tom Clegg
Actions #4

Updated by Brett Smith almost 9 years ago

  • Assigned To deleted (Tom Clegg)
Actions #5

Updated by Brett Smith almost 9 years ago

  • Subject changed from [Documentation] Explain extra steps needed when compute hostnames are not computeN to [Documentation] Explain extra steps needed when compute hostnames are not fooN
  • Description updated (diff)

Updating the description to reflect our post-#6156 world. We now support any kind of "fooN" schema, not just computeN, and there's no need to work around the original bug.

Actions #6

Updated by Tom Clegg almost 9 years ago

  • Target version changed from 2015-07-22 sprint to 2015-08-05 sprint
Actions #7

Updated by Tom Clegg almost 9 years ago

  • Assigned To set to Tom Clegg
Actions #9

Updated by Tom Clegg over 8 years ago

  • Status changed from New to In Progress
Actions #10

Updated by Tom Clegg over 8 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados|commit:e0a1fc70f919741a8ad840dc40cfcc87f2751722.

Actions

Also available in: Atom PDF