Project

General

Profile

Actions

Idea #7454

closed

[NodeManager] Use CustomData on to provision compute nodes on Azure instead of CustomScriptExtension

Added by Peter Amstutz over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Description

Currently we use CustomScriptForLinux to provision compute nodes. However, for some reason (we have not been able to get to the root cause) it fails to run the script reliably. To get around this we are using a cron job hack which re-runs CustomScriptForLinux until it succeeds. It would be better to solve the problem a different way.

It turns out there is actually a simpler way to put a small file onto a newly provisioned node than what we have been trying to do with CustomScriptForLinux. Somehow I overlooked this feature before or I would have implemented this way originally.

https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-how-to-inject-custom-data/

1) Add custom data support to libcloud (done)
2) Update node manager to put the ping URL command into custom data
3) in /etc/waagent.conf:

Provisioning.Enabled=y 
Provisioning.DecodeCustomData=y
Provisioning.ExecuteCustomData=y

4) Build a new image.


Subtasks 1 (0 open1 closed)

Task #8004: Review 7454-azure-custom-dataResolvedTom Clegg12/16/2015Actions
Actions #1

Updated by Peter Amstutz over 8 years ago

  • Tracker changed from Bug to Idea
  • Description updated (diff)
Actions #2

Updated by Peter Amstutz over 8 years ago

  • Description updated (diff)
Actions #3

Updated by Brett Smith over 8 years ago

  • Target version set to Arvados Future Sprints
Actions #4

Updated by Ward Vandewege over 8 years ago

I've tested a new compute image with these specs on c97qk, and this change works fine. I've run a diagnostics job successfully on c97qk with the new image.

Actions #5

Updated by Peter Amstutz over 8 years ago

  • Status changed from New to In Progress
Actions #6

Updated by Peter Amstutz over 8 years ago

  • Target version changed from Arvados Future Sprints to 2015-12-16 sprint
Actions #7

Updated by Peter Amstutz over 8 years ago

  • Assigned To set to Peter Amstutz
Actions #8

Updated by Tom Clegg over 8 years ago

The docstring for BaseComputeNodeDriver.arvados_create_kwargs should be updated to document the new size argument.

Nit: Perhaps it would be a little easier to read if arvados_create_kwargs() used the same argument order as create_node(), instead of swapping them here?
  •     def create_node(self, size, arvados_node):
            ...
            kwargs.update(self.arvados_create_kwargs(arvados_node, size))
    
Was this dropped deliberately?
  • echo "%s" > /var/tmp/arv-node-data/meta-data/local-ipv4
    

Side note / existing smell: I wish we were using real shellescape instead of "knowing" there won't be any backslashes, quotation marks, dollar signs, etc. in instance_id, instance_type, or arv-ping-url... but at least we're doing a little better now without the bash -c '...' layer.

Actions #9

Updated by Peter Amstutz over 8 years ago

Tom Clegg wrote:

The docstring for BaseComputeNodeDriver.arvados_create_kwargs should be updated to document the new size argument.

Nit: Perhaps it would be a little easier to read if arvados_create_kwargs() used the same argument order as create_node(), instead of swapping them here?
  • [...]

Fixed.

Was this dropped deliberately?
  • [...]

Yes. Previously the bash command was constructed after the node was created, so we had access to the IP address. The new command is constructed before the node is created, so we don't know the address yet. Since discovering the IP address is trivial it's not something that needs to be recorded as metadata anyway.

Side note / existing smell: I wish we were using real shellescape instead of "knowing" there won't be any backslashes, quotation marks, dollar signs, etc. in instance_id, instance_type, or arv-ping-url... but at least we're doing a little better now without the bash -c '...' layer.

Fixed to use pipes.quote() (which is noted as deprecated in the Python documentation but the recommended replacement requires an additional package dependency or Python 3.2).

Actions #10

Updated by Tom Clegg over 8 years ago

LGTM, thanks

Actions #11

Updated by Peter Amstutz over 8 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:39ccab11524517c101fad39eab02603022f15a99.

Actions

Also available in: Atom PDF