Project

General

Profile

Actions

Bug #6156

closed

[API] Node record accommodates setups where hostnames are set statically by the sysadmin

Added by Tom Clegg over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
API
Target version:
Story points:
1.0

Description

Goals

Have the process documented in our compute node install guide work out of the box. Right now this doesn't work because when the node first pings the API server, #ping assigns a slot_number for the node, then overwrites the hostname based on that.

We can't just have the sysadmin assign a slot_number in the install process, because it needs to be unique and it's hard to figure out a general way to accommodate that constraint.

Implementation

  • Introduce a configuration variable to API server assign_node_hostname that can be a format string (similar to the dns configs) to generate a compute node hostname given a slot_number. It can also be set to false if the API server is not expected to generate hostnames.
  • Update Node.hostname_for_slot to return a result based on the configured format string, or nil if the setting is unset.
  • In Node#ping, assign a hostname if hostname is nil and assign_node_hostname is set. Move this code outside the if self.slot_number.nil? block, below it—each value can be set independently when it's not included in the existing record.
  • Only run the Node DNS checking code (below hostname_for_slot) when assign_node_hostname is set.
In application.default.yml
  •   # Hostname to assign to a compute node when it sends a "ping" and the
      # hostname in its Node record is nil.
      # During bootstrapping, the "ping" script is expected to notice the
      # hostname given in the ping response, and update its unix hostname
      # accordingly.
      # If false, leave the hostname alone (this is appropriate if your compute
      # nodes' hostnames are already assigned by some other mechanism).
      #
      # One way or another, the hostnames of your node records should agree
      # with your DNS records and your /etc/slurm-llnl/slurm.conf files.
      #
      # Example for compute0000, compute0001, ....: "compute%<slot_number>04d" 
      # (See http://ruby-doc.org/core-2.2.2/Kernel.html#method-i-format for more.)
      assign_node_hostname: "compute{slot_number}" 
    

Subtasks 1 (0 open1 closed)

Task #6289: Review branch: 6156-hostnames-in-nodesResolvedRadhika Chippada06/17/2015Actions

Related issues

Related to Arvados - Bug #6157: [Documentation] Explain extra steps needed when compute hostnames are not fooNResolvedTom Clegg07/31/2015Actions
Actions #1

Updated by Tom Clegg over 9 years ago

  • Project changed from 35 to Arvados
  • Category set to API
Actions #2

Updated by Brett Smith over 9 years ago

  • Target version changed from Bug Triage to 2015-07-08 sprint
Actions #3

Updated by Brett Smith over 9 years ago

  • Subject changed from [API] nodes.ping should not clobber an existing hostname when assigning a slot_number to [API] Nodes only assign and use slot_number to generate hostname on first ping
  • Description updated (diff)
  • Story points changed from 0.5 to 1.0
Actions #4

Updated by Tom Clegg over 9 years ago

Add note in application.default.yml that slot_numbers assigned by clients are exempt from max_compute_nodes.

Actions #5

Updated by Brett Smith over 9 years ago

  • Subject changed from [API] Nodes only assign and use slot_number to generate hostname on first ping to [API] Node record accommodates setups where hostnames are set statically by the sysadmin
  • Description updated (diff)
Actions #6

Updated by Radhika Chippada over 9 years ago

  • Assigned To set to Radhika Chippada
Actions #7

Updated by Radhika Chippada over 9 years ago

  • Status changed from New to In Progress
Actions #8

Updated by Tom Clegg over 9 years ago

  • Description updated (diff)
Actions #9

Updated by Tom Clegg over 9 years ago

  • Description updated (diff)
Actions #10

Updated by Radhika Chippada over 9 years ago

Notes regarding branch 6156-hostnames-in-nodes:

  • All stated items in the description are implemented. Unit tests are added.
  • Currently, the code supports both these formats: "compute{slot_number}" and "compute%<slot_number>". Or with padding: 'compute#{slot_number.to_s.rjust(4, "0")}' and 'compute%<slot_number>04d'. However, I think it is desirable that we support only one of these formats and keep the code maintenance simple.
Actions #11

Updated by Brett Smith over 9 years ago

Radhika Chippada wrote:

  • Currently, the code supports both these formats: "compute{slot_number}" and "compute%<slot_number>". Or with padding: 'compute#{slot_number.to_s.rjust(4, "0")}' and 'compute%<slot_number>04d'. However, I think it is desirable that we support only one of these formats and keep the code maintenance simple.

The only functional requirement is that we use a format that supports arbitrary zero padding. As long as we have that, I have no more stake in which format(s) we support.

Actions #12

Updated by Brett Smith over 9 years ago

Reviewing 451a6d3

On the formatting question: we already have a similar configuration string for Node DNS settings, dns_server_update_command. It uses String#% to interpolate the configuration string. Let's use the same format and method for assign_node_hostname. That will provide consistency for users, it means you only have to worry about one method to interpolate strings without scanning their contents, you don't have to use eval, and it supports zero padding. It uses the same formatting that's documented in the link in the existing comments, so that works out great too.

If interpolating the string raises an exception, I think the server should return some kind of error response, rather than silently handling the request without assigning a proper hostname. If the string can't be interpolated, it means the server is badly misconfigured, and I think it's appropriate to make that very visible to the server administrator. I would be fine with it if you just let the string interpolate exception bubble up to our stock error handlers. If you'd rather catch it and turn it into a nicer error response, go for it.

After that, just one style thing that wouldn't block the merge: in the loop that checks the DNS configuration, Rails.configuration.assign_node_hostname will never change, so checking it inside the loop seems slightly wasteful (and makes a bigger diff) than checking it outside the loop along with Rails.configuration.dns_server_conf_dir and Rails.configuration.dns_server_conf_template. Would it be cleaner to add the condition to the if that's already there?

Thanks.

Actions #13

Updated by Radhika Chippada over 9 years ago

Thanks Brett for the feedback. Isn't silly that I didn't see that about "in the loop that checks the DNS configuration ..." :).

All suggested updates are made. Assigning the review task back to you, in case you want to take another quick look. Thanks.

Actions #14

Updated by Brett Smith over 9 years ago

82a3d53 is good to merge. I would only suggest a couple of small test improvements from here.

There's a test "ping node with no hostname and malformed config and expect nil for hostname." After the change made from the last round of comments, this title no longer accurately reflects the desired behavior. Updating the test title would be a good start.

I also wonder if we really want to assert that the method raises ArgumentError. To me, that seems a little too specific: our main concern is that the server returns an error, and less about what specific exception causes it. This test might be more accurately written as a controller test which asserts that the server responds with 422 or a 5xx status code. That test would continue to work even if later on we catch the exception and render a specific error.

Thanks.

Actions #15

Updated by Radhika Chippada over 9 years ago

Thanks Brett. Converted that unit test into a controller test.

Actions #16

Updated by Radhika Chippada over 9 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:970bcc6f77a7b1ff14a7ec124e3004d89b1f173a.

Actions

Also available in: Atom PDF