https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422017-11-29T16:15:22ZArvadosArvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=574842017-11-29T16:15:22ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Subject</strong> changed from <i>[Node Manager] Creates compute nodes from spot instances</i> to <i>[Node Manager] Creates compute nodes using AWS spot instances</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/57484/diff?detail_id=54970">diff</a>)</li><li><strong>Target version</strong> set to <i>To Be Groomed</i></li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=575322017-11-29T18:58:52ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Idea</i></li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=600072018-02-07T19:08:37ZTom Morristfmorris@veritasgenetics.com
<ul></ul><p>Although there's not support in libcloud, it is available in boto, which might be another option: <a class="external" href="http://boto.cloudhackers.com/en/latest/ref/ec2.html">http://boto.cloudhackers.com/en/latest/ref/ec2.html</a></p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=600142018-02-07T20:28:59ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><ul>
<li><strong>Using Boto3:</strong> <a class="external" href="http://boto3.readthedocs.io/en/latest/index.html">http://boto3.readthedocs.io/en/latest/index.html</a>
<ul>
<li>Pros:
<ul>
<li>Full fledged AWS library with spot support (<a class="external" href="http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.Client.request_spot_instances">http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.Client.request_spot_instances</a>)</li>
<li>Seems to be the “official” python library (AWS docs points to it from their documentation site)</li>
</ul>
</li>
<li>Cons:
<ul>
<li>It’s integration into nodemanager may complicate the code further</li>
<li>Additional dependency</li>
</ul>
</li>
</ul>
</li>
<li><strong>Expanding libcloud</strong> (maybe reusing <a class="external" href="https://github.com/muccg/libcloud-drivers">https://github.com/muccg/libcloud-drivers</a> (Apache licensed) - didn’t get to test it yet, but they’re just a few lines of code):
<ul>
<li>Pros
<ul>
<li>It’s supposedly easy, as mentioned on the mailing list (although message it’s a bit old): <a class="external" href="https://mail-archives.apache.org/mod_mbox/libcloud-dev/201106.mbox/%3CBANLkTinzMApt5EggweEuooX2siFERbuSvQ@mail.gmail.com%3E">https://mail-archives.apache.org/mod_mbox/libcloud-dev/201106.mbox/%3CBANLkTinzMApt5EggweEuooX2siFERbuSvQ@mail.gmail.com%3E</a></li>
<li>Would fit on the rest of nodemanager’s mechanics</li>
<li>Spot API designed to be similar to On Demand API (<a class="external" href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RequestSpotInstances.html">https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RequestSpotInstances.html</a> & <a class="external" href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html">https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html</a>)</li>
</ul>
</li>
<li>Cons:
<ul>
<li>No one seems to have tried integrating muccg's prototype into libcloud before, is that a sign of trouble ahead?</li>
<li>Didn’t get to read Spot docs too deeply, but maybe their internals changed over time and have diverged from what libcloud does with EC2 driver.</li>
</ul>
</li>
</ul>
</li>
<li>My opinion: We should time box a test to see if libcloud can be made to work with these APIs, if that's possible, I think it will take less effort than adding Boto3 and also we would be contributing to a project that we’re already invested in.</li>
</ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=611862018-03-21T20:31:53ZTom Morristfmorris@veritasgenetics.com
<ul></ul><p>We'll pursue the libcloud implementation option and implement spot instances using the default bid price (ie the on demand price).</p>
<p>API server will have a config option which specifies whether spot instances are enabled or not. If they are enabled, child containers will get created with the spot instances scheduling parameter set.</p>
<p>Spot instances will be their own instance type. Node manager needs to manage instance types separately from the libcloud-specified instance type that it currently does. Node manager will use the new libcloud aupport to request spot instances when when needed. No arvados-cwl-runner required.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=611902018-03-21T20:35:41ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Blocked by</strong> <i><a class="issue tracker-6 status-3 priority-4 priority-default closed parent" href="/issues/13051">Idea #13051</a>: Spike - Investigate/prototype AWS spot instance support in libcloud</i> added</li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=611912018-03-21T20:38:09ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Story points</strong> set to <i>5.0</i></li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=615302018-04-03T20:02:33ZLucas Di Pentimalucas.dipentima@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/61530/diff?detail_id=58638">diff</a>)</li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=619442018-04-18T15:43:49ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Target version</strong> changed from <i>To Be Groomed</i> to <i>Arvados Future Sprints</i></li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=622822018-05-02T14:13:17ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/62282/diff?detail_id=59323">diff</a>)</li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=623042018-05-02T16:14:44ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Nodemanager refactoring/updates:</p>
<ul>
<li>Nodemanager spot instance handling:
<ul>
<li><code>[Size <name>]</code> sections on the config use instance types as <name>: decouple that and add it as instance_type attribute inside the section leaving <name> for description purposes only</li>
<li>Each size section will have a boolean “preemptable” attribute, defaulting to False.</li>
<li>Update ServerCalculator & related code so that the instance type is not the unique id of a "nodesize" </li>
<li>Update ec2 driver to pass the the <code>ex_spot_marke=True</code> parameter on the libcloud create_node call</li>
</ul>
</li>
<li>Update documentation explaining nodemanager config file format changes</li>
<li>Tests</li>
</ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=623382018-05-02T17:10:44ZLucas Di Pentimalucas.dipentima@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/62338/diff?detail_id=59377">diff</a>)</li><li><strong>Story points</strong> changed from <i>5.0</i> to <i>3.0</i></li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=625272018-05-09T15:28:45ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Target version</strong> changed from <i>Arvados Future Sprints</i> to <i>2018-05-23 Sprint</i></li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=625282018-05-09T15:30:52ZLucas Di Pentimalucas.dipentima@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Lucas Di Pentima</i></li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=628092018-05-18T19:12:34ZLucas Di Pentimalucas.dipentima@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=628902018-05-23T13:17:53ZLucas Di Pentimalucas.dipentima@curii.com
<ul><li><strong>Target version</strong> changed from <i>2018-05-23 Sprint</i> to <i>2018-06-06 Sprint</i></li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=629782018-05-25T14:16:46ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Updates at <a class="changeset" title="7478: On EC2 driver ask for a spot instance when needed. Arvados-DCO-1.1-Signed-off-by: Lucas Di..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/3950ffc9481c25262f2db2b08a0f74664c433734">3950ffc94</a> - Branch <code>7478-anm-spot-instances</code></p>
<ul>
<li>Updated <code>libcloud</code> version dependency to use our fork with AWS Spot Instances support</li>
<li>Added support for a <code>preemptable</code> scheduling parameter on the API server</li>
<li>Added support on Go SDK & <code>dispatchcloud</code></li>
<li>Modified nodemanager to detach node size from instance types, adding the <code>preemptable</code> parameter.</li>
<li>Updated the EC2 driver to check for the <code>preemptable</code> parameter and ask for Spot instances when needed.</li>
</ul>
<p>I'm hopeful that propagating node sizes metadata by passing the CloudSizeWrapper object is a good approach. Unit tests are failing because of this (I don't want to start correcting them before confirming that's a good approach), but integration tests are passing.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=629822018-05-25T15:23:46ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><ul>
<li>Not your fault, but a method named <code>validate_scheduling_parameters</code> that is <code>before_validation</code> and not part of <code>validate</code> is confusing. Validations shouldn't change parameter values (but it isn't a technically a validation step...) Specifically I'm not sure if errors.add() does what you expect when it appears in a <code>before_validation</code> rather than a <code>validate</code>. Would you mind cleaning that up so the record adjustments are in before_validate and the value checks are in validate?</li>
</ul>
<ul>
<li>A brief comment about the intention of setting/checking the preemptable flag would be helpful because the logic is slightly convoluted.</li>
</ul>
<ul>
<li>Do we really want to totally disallow making top level containers preemptable, or just not assign them as preemptable by default? Seems like if it is explicitly set in the request, we should honor it.</li>
</ul>
<ul>
<li>It looks like CloudSizeWrapper is will still use the value of "id" from the underlying NodeSize object rather than the name used in the "[Size foo]" section title. I think if you add something like <code>size_spec['id'] = sec_words[1]</code> in NodeManagerConfig.node_sizes() then it will use the user-supplied id.</li>
</ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=629832018-05-25T15:37:38ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><ul>
<li>Is it necessary to set <code>instance_type</code> on CloudSizeWrapper? After using it to look up the corresponding libcloud NodeSize in NodeManagerConfig.node_sizes(), the instance_type field seems to be redundant with the <code>real</code> size object.</li>
</ul>
<ul>
<li>Additionally, the use of "instance_type" seems to be inconsistent, because when we get it from runtime constraints, it is the Arvados configuration-assigned name of the size, not the cloud provider size id.</li>
</ul>
<ul>
<li>In list_nodes() for ec2, azure and gce we map back from the reported instance size to our node size object (each does it in a slightly different way, of course). However, we need to start mapping back to our arvados-assigned instance type, not the cloud type. This means (a) ComputeNodeDriver.sizes should correspond to ServerCalculator.cloud_sizes (b) we need to store the arvados-assigned instance type on the node as a tag, and use that rather than the cloud's own response.</li>
</ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=630712018-06-04T13:27:00ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Updates at <a class="changeset" title="Merge branch 'master' into 7478-anm-spot-instances Arvados-DCO-1.1-Signed-off-by: Lucas Di Penti..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/73872ccc5bb6b80a6049b44b0113085a9c2b6934">73872ccc5bb6b80a6049b44b0113085a9c2b6934</a><br />Test run: <a class="external" href="https://ci.curoverse.com/job/developer-run-tests/734/">https://ci.curoverse.com/job/developer-run-tests/734/</a></p>
Addressed comments above:
<ul>
<li>Cleaned up validation code on API server</li>
<li>Avoid redundant attribute <code>instance_type</code> on <code>CloudSizeWrapper</code></li>
<li>Override CloudSizeWrapper id with config Size name</li>
<li>Set <code>arvados_node_size</code> tag on node creation to have a reference to the Arvados assigned node size</li>
<li>Use the newly added tag to get the Arvados assigned node size when receiving the node list</li>
</ul>
<p>Tests are pending</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=631222018-06-04T20:29:17ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><ul>
<li>I think this is backwards, should be "child containers" or (to align more closely with the logic) "containers with parent containers".</li>
</ul>
<pre>
# If preemptable instances (eg: AWS Spot Instances) are allowed,
# automatically ask them on non-child containers by default.
</pre>
<ul>
<li>I don't think this is is correct:</li>
</ul>
<pre>
self.scheduling_parameters['preemptable'] ||= true
</pre>
<p>Because if 'preemptable' is 'false' it will be assigned 'true'. I think we want:</p>
<pre>
if Rails.configuration.preemptable_instances and !self.requesting_container_uuid.nil? and self.scheduling_parameters['preemptable'].nil?
self.scheduling_parameters['preemptable'] = true
end
</pre>
<p>This previous comments isn't addressed:</p>
<blockquote>
<p>In list_nodes() for ec2, azure and gce we map back from the reported instance size to our node size object (each does it in a slightly different way, of course). However, we need to start mapping back to our arvados-assigned instance type, not the cloud type. This means (a) ComputeNodeDriver.sizes should correspond to ServerCalculator.cloud_sizes (b) we need to store the arvados-assigned instance type on the node as a tag, and use that rather than the cloud's own response.</p>
</blockquote>
<p>I see you are setting <code>arvados_node_size</code> in tags, but not reading it back in <code>list_nodes()</code>. This is a problem because <code>list_nodes()</code> is used to determine whether to start or stop nodes. If we define two node types "m4.large.preemptable" and "m4.large.reserved" but <code>list_nodes()</code> only returns <code>m4.large</code> then it won't match either size.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=631272018-06-04T20:47:26ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Followup to last comment: looking up the "arvados node size" happens in CloudNodeListMonitorActor, so that should work.</p>
<p>What happens if someone reconfigures the system and restarts node manager and you get back an arvados_node_size you don't recognize any more? The correct behavior in that case should be to shut the node down.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=631822018-06-06T14:43:59ZLucas Di Pentimalucas.dipentima@curii.com
<ul><li><strong>Target version</strong> changed from <i>2018-06-06 Sprint</i> to <i>2018-06-20 Sprint</i></li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=632922018-06-07T20:17:56ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>(04:10:32 PM) lucas: tetron: re:shutting down nodes that don't include a recognized arvados_node_size (last comment at <a class="external" href="https://dev.arvados.org/issues/7478#note-23">https://dev.arvados.org/issues/7478#note-23</a>), is it a correct approach to just call the destroy_node from CloudNodeListMonitorActor?<br />(04:11:35 PM) tetron: no<br />(04:12:24 PM) tetron: welll<br />(04:12:39 PM) lucas: tetron: Should I assign a proper status so that the pairing mechanism kills it or simething like that?<br />(04:13:54 PM) tetron: if we can do that through the "I am eligible for shutdown" interaction between ComputeNodeMonitorActor and DaemonActor that would be best<br />(04:14:53 PM) tetron: given how much effort we've spent handling various cloud failure modes I am very hesitant to add another place where we make a cloud API call<br />(04:15:23 PM) tetron: because then we're back to "oops we got a weird error and now nodemanager is in a death spiral" <br />(04:16:08 PM) tetron: remember it does create a ComputeNodeMonitorActor for every node, paired or not<br />(04:16:54 PM) tetron: so it can go through the normal mechanism of discovering the node in the node list, creating a ComputeNodeMonitorActor, then have the MonitorActor decide the node shouldn't exist, and tell daemon "please shut me down" <br />(04:18:39 PM) lucas: ok, I was trying to kill it as soon as the size is confirmed that is not recognizable because find_size returns None and will create problems when other parts of the code try to access it, I'll look for that approach<br />(04:19:02 PM) tetron: that's understandable<br />(04:19:20 PM) tetron: maybe have an "invalid size" stand-in<br />(04:19:54 PM) lucas: Yes, that could work. Thanks</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=633662018-06-11T21:04:37ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Updates at <a class="changeset" title="7478: Adds test to check that state is 'down' with 'invalid' size. Arvados-DCO-1.1-Signed-off-by..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/17f521d7ffb4f3a58ca98a27395eb60d9fa34519">17f521d7f</a><br />Test run: <a class="external" href="https://ci.curoverse.com/job/developer-run-tests/747/">https://ci.curoverse.com/job/developer-run-tests/747/</a></p>
<p>Since <code>node-22</code>, the updates are:</p>
<ul>
<li>Updated api server CR's default preemptable setting logic as suggested</li>
<li>When a cloud node has an unrecognizable <code>arvados_node_size</code> tag, instead of assigning None as its <code>.size</code>, set an <code>InvalidCloudSize</code> instance, so that <code>get_state()</code> returns <code>'down'</code> and the node get properly shutdown</li>
<li>Added tests</li>
</ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=633742018-06-11T22:51:28ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Updates at <a class="changeset" title="7478: Fixes GCE driver's arvados_node_size tag handling. Arvados-DCO-1.1-Signed-off-by: Lucas Di..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/b70f9ce54f1f672b423999e6c07b2f0127b76666">b70f9ce54</a><br />Test run: <a class="external" href="https://ci.curoverse.com/job/developer-run-tests/748/">https://ci.curoverse.com/job/developer-run-tests/748/</a></p>
<ul>
<li>Fixed a GCE driver issue discovered when running integration tests.</li>
</ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=633802018-06-12T14:18:56ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Reviewing 7478-anm-spot-instances @ <a class="changeset" title="7478: Fixes GCE driver's arvados_node_size tag handling. Arvados-DCO-1.1-Signed-off-by: Lucas Di..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/b70f9ce54f1f672b423999e6c07b2f0127b76666">b70f9ce54f1f672b423999e6c07b2f0127b76666</a></p>
<ul>
<li>The check for "self.cloud_node.size.id == 'invalid'" should be in shutdown_eligible() instead of get_state().</li>
</ul>
<p>Rest LGTM</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=633962018-06-12T17:49:41ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Updates at <a class="changeset" title="7478: Moves invalid cloud size node's shutdown decision to proper method. Arvados-DCO-1.1-Signed..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/71db701269dc5d2b5eb9239828a74e9c26cd7e66">71db70126</a><br />Test run: <a class="external" href="https://ci.curoverse.com/job/developer-run-tests/749/">https://ci.curoverse.com/job/developer-run-tests/749/</a></p>
<p>Addressed above suggestions making <code>shutdown_eligible()</code> the responsible of checking for an invalid cloud size. Updated test.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=633992018-06-12T18:36:18ZLucas Di Pentimalucas.dipentima@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>Applied in changeset <a class="changeset" title="Merge branch '7478-anm-spot-instances' Closes #7478 Arvados-DCO-1.1-Signed-off-by: Lucas Di Pent..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/42a0609a6e287a82ed565413c7392d40141388ae">arvados|42a0609a6e287a82ed565413c7392d40141388ae</a>.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=634142018-06-12T21:11:21ZNico César
<ul></ul><p>deployed 1.1.4.20180612182441-2 and I see this error:</p>
<pre>
manage.4xphq:/etc/sv# systemctl restart arvados-node-manager ; journalctl -u arvados-node-manager -f
-- Logs begin at Tue 2018-06-05 10:34:26 UTC. --
Jun 12 21:13:00 manage.4xphq.arvadosapi.com env[8606]: 2018-06-12 21:13:00 NodeManagerDaemonActor.a06ef1818b7a[8607] INFO: Compute Optimized Double Extra Large Instance: wishlist 0, up 0 (booting 0, unpaired 0, idle 0, busy 0), down 0, shutdown 0
Jun 12 21:13:00 manage.4xphq.arvadosapi.com env[8606]: 2018-06-12 21:13:00 NodeManagerDaemonActor.a06ef1818b7a[8607] INFO: Double Extra Large Instance: wishlist 0, up 0 (booting 0, unpaired 0, idle 0, busy 0), down 0, shutdown 0
Jun 12 21:13:00 manage.4xphq.arvadosapi.com env[8606]: 2018-06-12 21:13:00 NodeManagerDaemonActor.a06ef1818b7a[8607] INFO: Compute Optimized Extra Large Instance: wishlist 0, up 0 (booting 0, unpaired 0, idle 0, busy 0), down 0, shutdown 0
Jun 12 21:13:00 manage.4xphq.arvadosapi.com env[8606]: 2018-06-12 21:13:00 NodeManagerDaemonActor.a06ef1818b7a[8607] INFO: Extra Large Instance: wishlist 0, up 0 (booting 0, unpaired 0, idle 0, busy 0), down 0, shutdown 0
Jun 12 21:13:00 manage.4xphq.arvadosapi.com env[8606]: 2018-06-12 21:13:00 NodeManagerDaemonActor.a06ef1818b7a[8607] INFO: Extra Large Instance: wishlist 0, up 0 (booting 0, unpaired 0, idle 0, busy 0), down 0, shutdown 0
Jun 12 21:13:00 manage.4xphq.arvadosapi.com env[8606]: 2018-06-12 21:13:00 NodeManagerDaemonActor.a06ef1818b7a[8607] INFO: Compute Optimized Large Instance: wishlist 0, up 0 (booting 0, unpaired 0, idle 0, busy 0), down 0, shutdown 0
Jun 12 21:13:00 manage.4xphq.arvadosapi.com env[8606]: 2018-06-12 21:13:00 NodeManagerDaemonActor.a06ef1818b7a[8607] INFO: Large Instance: wishlist 0, up 0 (booting 0, unpaired 0, idle 0, busy 0), down 0, shutdown 0
Jun 12 21:13:00 manage.4xphq.arvadosapi.com env[8606]: 2018-06-12 21:13:00 NodeManagerDaemonActor.a06ef1818b7a[8607] INFO: Large Instance: wishlist 1, up 0 (booting 0, unpaired 0, idle 0, busy 0), down 0, shutdown 0
Jun 12 21:13:00 manage.4xphq.arvadosapi.com env[8606]: 2018-06-12 21:13:00 JobQueueMonitorActor.140274303566672[8607] INFO: got response with 1 items in 0.254546880722 seconds, next poll at 2018-06-12 21:13:10
Jun 12 21:13:00 manage.4xphq.arvadosapi.com systemd[1]: Stopping Arvados Node Manager Daemon...
Jun 12 21:13:12 manage.4xphq.arvadosapi.com systemd[1]: Stopped Arvados Node Manager Daemon.
Jun 12 21:13:12 manage.4xphq.arvadosapi.com systemd[1]: Started Arvados Node Manager Daemon.
Jun 12 21:13:12 manage.4xphq.arvadosapi.com env[11286]: No handlers could be found for logger "status.Handler"
Jun 12 21:13:12 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:12 root[11289] INFO: /usr/bin/arvados-node-manager 1.1.4.20180612182441 started, libcloud 2.3.0
Jun 12 21:13:12 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:12 requests.packages.urllib3.connectionpool[11289] DEBUG: Starting new HTTPS connection (1): ec2.us-east-1.amazonaws.com
Jun 12 21:13:12 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:12 requests.packages.urllib3.connectionpool[11289] DEBUG: https://ec2.us-east-1.amazonaws.com:443 "GET /?SignatureVersion=2&AWSAccessKeyId=AKIAJCNUIVXKTYNJ5OSQ&Timestamp=2018-06-12T21%3A13%3A12Z&SignatureMethod=HmacSHA256&Version=2016-11-15&Signature=akHRIUej%2BbWx2kgKam9btOFiP3rhUxQ8JlYhrX4S9ZA%3D&Action=DescribeImages&Owner.1=self HTTP/1.1" 200 None
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 requests.packages.urllib3.connectionpool[11289] DEBUG: https://ec2.us-east-1.amazonaws.com:443 "GET /?SignatureVersion=2&AWSAccessKeyId=AKIAJCNUIVXKTYNJ5OSQ&Timestamp=2018-06-12T21%3A13%3A12Z&SignatureMethod=HmacSHA256&Version=2016-11-15&Signature=QeXzl46I%2BGeKbjpmHxj5ZAerIlYKol6Z3uID%2Frr864M%3D&Action=DescribeSecurityGroups HTTP/1.1" 200 None
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 requests.packages.urllib3.connectionpool[11289] DEBUG: https://ec2.us-east-1.amazonaws.com:443 "GET /?SignatureVersion=2&AWSAccessKeyId=AKIAJCNUIVXKTYNJ5OSQ&Timestamp=2018-06-12T21%3A13%3A13Z&SignatureMethod=HmacSHA256&Version=2016-11-15&Signature=uiUkBMKy%2FZB6IPuwnt1MGzbj4Od7YL4%2BZ%2FtKG9XU%2BT4%3D&Action=DescribeSubnets HTTP/1.1" 200 None
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: Using cloud node sizes:
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=m4.large, name=Large Instance, ram=8192 disk=0 bandwidth=None price=0.1 driver=Amazon EC2 ...>, 'preemptable': False, 'name': 'Large Instance', 'extra': {'cpu': 2}, 'scratch': 32000, 'price': 0.1, 'ram': 7782, 'bandwidth': None, 'cores': 2, 'disk': 0, 'id': 'm4.large'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=m4.large, name=Large Instance, ram=8192 disk=0 bandwidth=None price=0.1 driver=Amazon EC2 ...>, 'preemptable': True, 'name': 'Large Instance', 'extra': {'cpu': 2}, 'scratch': 32000, 'price': 0.1, 'ram': 7782, 'bandwidth': None, 'cores': 2, 'disk': 0, 'id': 'm4.large.spot'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=c3.large, name=Compute Optimized Large Instance, ram=3840 disk=32 bandwidth=None price=0.105 driver=Amazon EC2 ...>, 'preemptable': True, 'name': 'Compute Optimized Large Instance', 'extra': {'cpu': 2}, 'scratch': 32000, 'price': 0.105, 'ram': 3648, 'bandwidth': None, 'cores': 2, 'disk': 32, 'id': 'c3.large.spot'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=c3.large, name=Compute Optimized Large Instance, ram=3840 disk=32 bandwidth=None price=0.105 driver=Amazon EC2 ...>, 'preemptable': False, 'name': 'Compute Optimized Large Instance', 'extra': {'cpu': 2}, 'scratch': 32000, 'price': 0.105, 'ram': 3648, 'bandwidth': None, 'cores': 2, 'disk': 32, 'id': 'c3.large'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=m4.xlarge, name=Extra Large Instance, ram=16384 disk=0 bandwidth=None price=0.2 driver=Amazon EC2 ...>, 'preemptable': False, 'name': 'Extra Large Instance', 'extra': {'cpu': 4}, 'scratch': 80000, 'price': 0.2, 'ram': 15564, 'bandwidth': None, 'cores': 4, 'disk': 0, 'id': 'm4.xlarge'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=m4.xlarge, name=Extra Large Instance, ram=16384 disk=0 bandwidth=None price=0.2 driver=Amazon EC2 ...>, 'preemptable': True, 'name': 'Extra Large Instance', 'extra': {'cpu': 4}, 'scratch': 80000, 'price': 0.2, 'ram': 15564, 'bandwidth': None, 'cores': 4, 'disk': 0, 'id': 'm4.xlarge.spot'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=c3.xlarge, name=Compute Optimized Extra Large Instance, ram=7680 disk=80 bandwidth=None price=0.21 driver=Amazon EC2 ...>, 'preemptable': False, 'name': 'Compute Optimized Extra Large Instance', 'extra': {'cpu': 4}, 'scratch': 80000, 'price': 0.21, 'ram': 7296, 'bandwidth': None, 'cores': 4, 'disk': 80, 'id': 'c3.xlarge'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=c3.xlarge, name=Compute Optimized Extra Large Instance, ram=7680 disk=80 bandwidth=None price=0.21 driver=Amazon EC2 ...>, 'preemptable': True, 'name': 'Compute Optimized Extra Large Instance', 'extra': {'cpu': 4}, 'scratch': 80000, 'price': 0.21, 'ram': 7296, 'bandwidth': None, 'cores': 4, 'disk': 80, 'id': 'c3.xlarge.spot'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=m4.2xlarge, name=Double Extra Large Instance, ram=32768 disk=0 bandwidth=None price=0.4 driver=Amazon EC2 ...>, 'preemptable': False, 'name': 'Double Extra Large Instance', 'extra': {'cpu': 8}, 'scratch': 160000, 'price': 0.4, 'ram': 31129, 'bandwidth': None, 'cores': 8, 'disk': 0, 'id': 'm4.2xlarge'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=m4.2xlarge, name=Double Extra Large Instance, ram=32768 disk=0 bandwidth=None price=0.4 driver=Amazon EC2 ...>, 'preemptable': True, 'name': 'Double Extra Large Instance', 'extra': {'cpu': 8}, 'scratch': 160000, 'price': 0.4, 'ram': 31129, 'bandwidth': None, 'cores': 8, 'disk': 0, 'id': 'm4.2xlarge.spot'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=c3.2xlarge, name=Compute Optimized Double Extra Large Instance, ram=15360 disk=160 bandwidth=None price=0.42 driver=Amazon EC2 ...>, 'preemptable': True, 'name': 'Compute Optimized Double Extra Large Instance', 'extra': {'cpu': 8}, 'scratch': 160000, 'price': 0.42, 'ram': 14592, 'bandwidth': None, 'cores': 8, 'disk': 160, 'id': 'c3.2xlarge.spot'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=c3.2xlarge, name=Compute Optimized Double Extra Large Instance, ram=15360 disk=160 bandwidth=None price=0.42 driver=Amazon EC2 ...>, 'preemptable': False, 'name': 'Compute Optimized Double Extra Large Instance', 'extra': {'cpu': 8}, 'scratch': 160000, 'price': 0.42, 'ram': 14592, 'bandwidth': None, 'cores': 8, 'disk': 160, 'id': 'c3.2xlarge'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=c3.4xlarge, name=Compute Optimized Quadruple Extra Large Instance, ram=30720 disk=320 bandwidth=None price=0.84 driver=Amazon EC2 ...>, 'preemptable': True, 'name': 'Compute Optimized Quadruple Extra Large Instance', 'extra': {'cpu': 16}, 'scratch': 320000, 'price': 0.84, 'ram': 29184, 'bandwidth': None, 'cores': 16, 'disk': 320, 'id': 'c3.4xlarge.spot'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=c3.4xlarge, name=Compute Optimized Quadruple Extra Large Instance, ram=30720 disk=320 bandwidth=None price=0.84 driver=Amazon EC2 ...>, 'preemptable': False, 'name': 'Compute Optimized Quadruple Extra Large Instance', 'extra': {'cpu': 16}, 'scratch': 320000, 'price': 0.84, 'ram': 29184, 'bandwidth': None, 'cores': 16, 'disk': 320, 'id': 'c3.4xlarge'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=c3.8xlarge, name=Compute Optimized Eight Extra Large Instance, ram=61440 disk=640 bandwidth=None price=1.68 driver=Amazon EC2 ...>, 'preemptable': False, 'name': 'Compute Optimized Eight Extra Large Instance', 'extra': {'cpu': 32}, 'scratch': 640000, 'price': 1.68, 'ram': 58368, 'bandwidth': None, 'cores': 32, 'disk': 640, 'id': 'c3.8xlarge'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 arvnodeman.jobqueue[11289] INFO: {'real': <NodeSize: id=c3.8xlarge, name=Compute Optimized Eight Extra Large Instance, ram=61440 disk=640 bandwidth=None price=1.68 driver=Amazon EC2 ...>, 'preemptable': True, 'name': 'Compute Optimized Eight Extra Large Instance', 'extra': {'cpu': 32}, 'scratch': 640000, 'price': 1.68, 'ram': 58368, 'bandwidth': None, 'cores': 32, 'disk': 640, 'id': 'c3.8xlarge.spot'}
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Registered TimedCallBackActor (urn:uuid:e79cfca2-e7db-4441-aaab-49fcbcee068e)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Starting TimedCallBackActor (urn:uuid:e79cfca2-e7db-4441-aaab-49fcbcee068e)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Registered CloudNodeListMonitorActor (urn:uuid:8a03c978-fa6e-442e-85f1-25a89ac98acb)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Starting CloudNodeListMonitorActor (urn:uuid:8a03c978-fa6e-442e-85f1-25a89ac98acb)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Registered ArvadosNodeListMonitorActor (urn:uuid:4e4f4b1b-add6-4a06-8439-0871117c6d41)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Starting ArvadosNodeListMonitorActor (urn:uuid:4e4f4b1b-add6-4a06-8439-0871117c6d41)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Registered JobQueueMonitorActor (urn:uuid:2a47f596-37a8-49d9-9e97-526f2e85e829)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Starting JobQueueMonitorActor (urn:uuid:2a47f596-37a8-49d9-9e97-526f2e85e829)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Registered ComputeNodeUpdateActor (urn:uuid:92794057-f151-4d7b-8366-a7928bd47f1c)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Starting ComputeNodeUpdateActor (urn:uuid:92794057-f151-4d7b-8366-a7928bd47f1c)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 JobQueueMonitorActor.140593208914768[11289] DEBUG: urn:uuid:e27ac108-d616-48d5-aef5-e1a8b77a0365 subscribed to all events
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 JobQueueMonitorActor.140593208914768[11289] DEBUG: sending request
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 ArvadosNodeListMonitorActor.140593211085648[11289] DEBUG: urn:uuid:e27ac108-d616-48d5-aef5-e1a8b77a0365 subscribed to all events
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 ArvadosNodeListMonitorActor.140593211085648[11289] DEBUG: sending request
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 CloudNodeListMonitorActor.140593232598720[11289] DEBUG: urn:uuid:e27ac108-d616-48d5-aef5-e1a8b77a0365 subscribed to all events
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 CloudNodeListMonitorActor.140593232598720[11289] DEBUG: sending request
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 requests.packages.urllib3.connectionpool[11289] DEBUG: Starting new HTTPS connection (1): ec2.us-east-1.amazonaws.com
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Registered NodeManagerDaemonActor (urn:uuid:e27ac108-d616-48d5-aef5-e1a8b77a0365)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Starting NodeManagerDaemonActor (urn:uuid:e27ac108-d616-48d5-aef5-e1a8b77a0365)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Registered WatchdogActor (urn:uuid:ca05efc5-db63-412f-b0e1-4f56bb11f6c6)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 pykka[11289] DEBUG: Starting WatchdogActor (urn:uuid:ca05efc5-db63-412f-b0e1-4f56bb11f6c6)
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] DEBUG: Daemon started
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 requests.packages.urllib3.connectionpool[11289] DEBUG: https://ec2.us-east-1.amazonaws.com:443 "GET /?Filter.3.Value.1=4xphq&AWSAccessKeyId=AKIAJCNUIVXKTYNJ5OSQ&Filter.1.Name=instance-state-name&Filter.2.Value.1=dynamic-compute&SignatureMethod=HmacSHA256&Filter.3.Name=tag%3Acluster&Signature=aOZkPquswRZvn7Fx6xGIAWAxZNUhNMHho%2FqweBdq5hQ%3D&Action=DescribeInstances&Filter.1.Value.1=running&SignatureVersion=2&Timestamp=2018-06-12T21%3A13%3A13Z&Version=2016-11-15&Filter.2.Name=tag%3Aarvados-class HTTP/1.1" 200 None
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 requests.packages.urllib3.connectionpool[11289] DEBUG: https://ec2.us-east-1.amazonaws.com:443 "GET /?SignatureVersion=2&AWSAccessKeyId=AKIAJCNUIVXKTYNJ5OSQ&Timestamp=2018-06-12T21%3A13%3A13Z&SignatureMethod=HmacSHA256&Version=2016-11-15&Signature=v0y0o%2Fa%2FhUU9MvgQS75zLDv%2FUsQYHEsNJDj9zxsJpPc%3D&Action=DescribeAddresses HTTP/1.1" 200 None
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 CloudNodeListMonitorActor.140593232598720[11289] ERROR: got error: global name 'InvalidCloudSize' is not defined - will try again in 20.0 seconds
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: Traceback (most recent call last):
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: File "/usr/lib/python2.7/dist-packages/arvnodeman/clientactor.py", line 99, in poll
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: response = self._send_request()
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: File "/usr/lib/python2.7/dist-packages/arvnodeman/nodelist.py", line 86, in _send_request
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: n.size = self._calculator.find_size(n.extra['arvados_node_size'])
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: File "/usr/lib/python2.7/dist-packages/arvnodeman/jobqueue.py", line 142, in find_size
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: return InvalidCloudSize()
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: NameError: global name 'InvalidCloudSize' is not defined
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 ArvadosNodeListMonitorActor.140593211085648[11289] INFO: got response with 48 items in 0.229659795761 seconds, next poll at 2018-06-12 21:13:23
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] INFO: Registering new Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-bnkig53t8l0x1ci">4xphq-7ekkf-bnkig53t8l0x1ci</a>
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] INFO: Registering new Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-9ct5e14ouidq1x3">4xphq-7ekkf-9ct5e14ouidq1x3</a>
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] INFO: Registering new Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-k0es9pjugpjv7f0">4xphq-7ekkf-k0es9pjugpjv7f0</a>
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] INFO: Registering new Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-2fa53rvm0uaoxnl">4xphq-7ekkf-2fa53rvm0uaoxnl</a>
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] INFO: Registering new Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-tbrex80emflesql">4xphq-7ekkf-tbrex80emflesql</a>
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] INFO: Registering new Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-xyhjrnam94g23h1">4xphq-7ekkf-xyhjrnam94g23h1</a>
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] INFO: Registering new Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-jbgfjgqgefs6dzl">4xphq-7ekkf-jbgfjgqgefs6dzl</a>
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] INFO: Registering new Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-lg95nmgds6bdb4d">4xphq-7ekkf-lg95nmgds6bdb4d</a>
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] INFO: Registering new Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-0446wy2b6ofp838">4xphq-7ekkf-0446wy2b6ofp838</a>
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] INFO: Registering new Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-spv6sghrbe05g5i">4xphq-7ekkf-spv6sghrbe05g5i</a>
Jun 12 21:13:13 manage.4xphq.arvadosapi.com env[11286]: 2018-06-12 21:13:13 NodeManagerDaemonActor.e1a8b77a0365[11289] INFO: Registering new Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-gym16nks2abf1c2">4xphq-7ekkf-gym16nks2abf1c2</a>
</pre> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=634172018-06-12T21:21:10ZNico César
<ul></ul><p>after monkeypatch</p>
<pre>
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 ComputeNodeMonitorActor.ba3fafcf2920.compute2.4xphq.arvadosapi.com[16677] DEBUG: Suggesting shutdown because node's size tag 'None' not recognizable
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 NodeManagerDaemonActor.3b78803a4fc8[16677] INFO: Cloud node compute2.4xphq.arvadosapi.com is now paired with Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-bnkig53t8l0x1ci">4xphq-7ekkf-bnkig53t8l0x1ci</a> with hostname compute2
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 ArvadosNodeListMonitorActor.139825145078608[16677] DEBUG: urn:uuid:c8b286db-e3b8-4c82-9dc8-ba3fafcf2920 subscribed to events for '<a href="https://arvadosapi.com/4xphq-7ekkf-bnkig53t8l0x1ci">4xphq-7ekkf-bnkig53t8l0x1ci</a>'
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 ComputeNodeMonitorActor.e6ea226582fa.compute1.4xphq.arvadosapi.com[16677] DEBUG: Suggesting shutdown because node's size tag 'None' not recognizable
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 NodeManagerDaemonActor.3b78803a4fc8[16677] INFO: Cloud node compute1.4xphq.arvadosapi.com is now paired with Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-k0es9pjugpjv7f0">4xphq-7ekkf-k0es9pjugpjv7f0</a> with hostname compute1
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 ArvadosNodeListMonitorActor.139825145078608[16677] DEBUG: urn:uuid:29b88fe4-700f-41f6-807e-e6ea226582fa subscribed to events for '<a href="https://arvadosapi.com/4xphq-7ekkf-k0es9pjugpjv7f0">4xphq-7ekkf-k0es9pjugpjv7f0</a>'
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 ComputeNodeMonitorActor.ab0072e44ed9.compute3.4xphq.arvadosapi.com[16677] DEBUG: Suggesting shutdown because node's size tag 'None' not recognizable
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 NodeManagerDaemonActor.3b78803a4fc8[16677] INFO: Cloud node compute3.4xphq.arvadosapi.com is now paired with Arvados node <a href="https://arvadosapi.com/4xphq-7ekkf-9ct5e14ouidq1x3">4xphq-7ekkf-9ct5e14ouidq1x3</a> with hostname compute3
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 ArvadosNodeListMonitorActor.139825145078608[16677] DEBUG: urn:uuid:a201ef36-87cc-4f9f-abb9-ab0072e44ed9 subscribed to events for '<a href="https://arvadosapi.com/4xphq-7ekkf-9ct5e14ouidq1x3">4xphq-7ekkf-9ct5e14ouidq1x3</a>'
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: scontrol: error: Weight value (9999999000) is greater than 4294967280
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: No changes specified
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 ComputeNodeUpdateActor.bedafdd32e4c[16677] ERROR: SLURM update ['scontrol', 'update', u'NodeName=compute2', 'Weight=9999999000', 'Features=instancetype=invalid'] failed
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: Traceback (most recent call last):
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: File "/usr/lib/python2.7/dist-packages/arvnodeman/computenode/dispatch/slurm.py", line 26, in _update_slurm_node
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: subprocess.check_output(cmd)
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: File "/usr/lib/python2.7/subprocess.py", line 219, in check_output
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: raise CalledProcessError(retcode, cmd, output=output)
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: CalledProcessError: Command '['scontrol', 'update', u'NodeName=compute2', 'Weight=9999999000', 'Features=instancetype=invalid']' returned non-zero exit status 1
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: scontrol: error: Weight value (9999999000) is greater than 4294967280
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: No changes specified
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 ComputeNodeUpdateActor.bedafdd32e4c[16677] ERROR: SLURM update ['scontrol', 'update', u'NodeName=compute1', 'Weight=9999999000', 'Features=instancetype=invalid'] failed
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: Traceback (most recent call last):
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: File "/usr/lib/python2.7/dist-packages/arvnodeman/computenode/dispatch/slurm.py", line 26, in _update_slurm_node
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: subprocess.check_output(cmd)
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: File "/usr/lib/python2.7/subprocess.py", line 219, in check_output
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: raise CalledProcessError(retcode, cmd, output=output)
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: CalledProcessError: Command '['scontrol', 'update', u'NodeName=compute1', 'Weight=9999999000', 'Features=instancetype=invalid']' returned non-zero exit status 1
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: scontrol: error: Weight value (9999999000) is greater than 4294967280
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: No changes specified
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: 2018-06-12 21:20:36 ComputeNodeUpdateActor.bedafdd32e4c[16677] ERROR: SLURM update ['scontrol', 'update', u'NodeName=compute3', 'Weight=9999999000', 'Features=instancetype=invalid'] failed
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: Traceback (most recent call last):
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: File "/usr/lib/python2.7/dist-packages/arvnodeman/computenode/dispatch/slurm.py", line 26, in _update_slurm_node
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: subprocess.check_output(cmd)
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: File "/usr/lib/python2.7/subprocess.py", line 219, in check_output
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: raise CalledProcessError(retcode, cmd, output=output)
Jun 12 21:20:36 manage.4xphq.arvadosapi.com env[16674]: CalledProcessError: Command '['scontrol', 'update', u'NodeName=compute3', 'Weight=9999999000', 'Features=instancetype=invalid']' returned non-zero exit status 1
</pre> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=634202018-06-12T21:44:21ZNico César
<ul></ul><p>I manually applied puppet branch 7478-spot-instances-4xphq into 4xphq and disabled puppet</p>
<p>we're testing this with lucas</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=634212018-06-13T01:54:27ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Updates at <a class="changeset" title="7478: Fixes InvalidCloudSize creation. Adds wishlist related node info to logs. Arvados-DCO-1.1-..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/115a5e8861ef0a46224b2cd64568b30c884908fb">115a5e886</a> - branch <code>7478-invalid-size-not-defined</code><br />Test run: <a class="external" href="https://ci.curoverse.com/job/developer-run-tests/750/">https://ci.curoverse.com/job/developer-run-tests/750/</a></p>
<ul>
<li>Fixes <code>InvalidCloudSize</code> instantiation</li>
<li>Fixes <code>arvados_node_size</code> tag retrieval</li>
<li>Adds node size related information on logs when referring to a size by name.</li>
</ul>
<p>The <code>scontrol</code> error message I believe it's related to stopping unrecognized nodes.</p>
<p>I did some more testing running normal (not preemptable) CRs on 4xphq and it seems that it's working OK. Just in case, I left nodemanager stopped.</p>
<p>I also added spot sizes on 4xphq c-d-s config to match those already added to nodemanager.</p>
<p>Pending: Test spot instances creation. Before enabling spot instances on child containers on the API server, we can add <code>preemptable = true</code> to any "non-spot" cloud size on nodemanager, for example m4.large, and run something while keeping an eye on the AWS console. If that is successful, we could enable API server's <code>preemptable_instances = true</code> configuration and check that child containers get their scheduling parameter as expected.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=634292018-06-13T12:56:25ZNico César
<ul></ul><p>review at 115a5e8861ef0a46224b2cd64568b30c884908fb this looks a good bugfix to me.</p>
<p>ready to merge</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=634702018-06-13T19:27:05ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Following tests with Nico, we've discovered an error when setting nodemanager's libcloud dependencies. I'll make a new branch for that.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=634712018-06-13T19:31:01ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Updates at <a class="changeset" title="7478: Fix nodemanager's libcloud install dependency. Arvados-DCO-1.1-Signed-off-by: Lucas Di Pen..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/089b68192e6fd089c03331c389da1dace039c899">089b68192</a> - branch <code>7478-anm-libcloud-deps-fix</code><br />Test run: <a class="external" href="https://ci.curoverse.com/job/developer-run-tests/751/">https://ci.curoverse.com/job/developer-run-tests/751/</a></p>
<p>Updated install dependency on nodemanager for libcloud fork with spot instance support.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=634752018-06-13T20:04:21ZNico César
<ul></ul><p>Review at 089b68192 - branch 7478-anm-libcloud-deps-fix</p>
<p>LGTM</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=635822018-06-19T17:06:39ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Branch <code>7478-s-preemptable-preemptible</code> - <a class="changeset" title="7478: Replaces term 'preemptable' with 'preemptible' Also added config & documentation on EC2 ex..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/a8bfbac314335eb0bff3f4ff5e856d3c327de31d">a8bfbac31</a><br />Test run: <a class="external" href="https://ci.curoverse.com/job/developer-run-tests/766/">https://ci.curoverse.com/job/developer-run-tests/766/</a></p>
<p>As suggested by Tom, replaced the term 'preemptable' with 'preemptible'.<br />Also added config & documentation on nodemanager's EC2 example config file for spot instances.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=635852018-06-19T18:21:32ZTom Cleggtom@curii.com
<ul></ul><p>LGTM</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=635912018-06-20T11:52:01ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Branch <code>7478-auto-preemptible-cr-fix</code> - <a class="changeset" title="7478: Fixes default preemptible scheduling parameter setting. The API server wasn't auto-assigni..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/36da5d97f623f0c2c944829ca8410a3bea388b19">36da5d97f623f0c2c944829ca8410a3bea388b19</a><br />Test run: <a class="external" href="https://ci.curoverse.com/job/developer-run-tests/770/">https://ci.curoverse.com/job/developer-run-tests/770/</a></p>
<p>API server wasn't automatically adding the <code>preemptible</code> scheduling parameter on child container requests when <code>'Rails.configuration.preemptible_instances = true'</code> because of a callback ordering issue.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=635942018-06-20T13:08:56ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Further testing on 4xphq show that when the CR has <code>preemptible=true</code> scheduling parameter, c-d-s isn't requesting the correct instance type, seemingly ignoring this parameter.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=636332018-06-20T15:44:36ZLucas Di Pentimalucas.dipentima@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-4 priority-default closed parent" href="/issues/13649">Bug #13649</a>: c-d-s doesn't request a preemptible instance when it should</i> added</li></ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=636692018-06-20T17:50:28ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Lucas Di Pentima wrote:</p>
<blockquote>
<p>Branch <code>7478-auto-preemptible-cr-fix</code> - <a class="changeset" title="7478: Fixes default preemptible scheduling parameter setting. The API server wasn't auto-assigni..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/36da5d97f623f0c2c944829ca8410a3bea388b19">36da5d97f623f0c2c944829ca8410a3bea388b19</a><br />Test run: <a class="external" href="https://ci.curoverse.com/job/developer-run-tests/770/">https://ci.curoverse.com/job/developer-run-tests/770/</a></p>
<p>API server wasn't automatically adding the <code>preemptible</code> scheduling parameter on child container requests when <code>'Rails.configuration.preemptible_instances = true'</code> because of a callback ordering issue.</p>
</blockquote>
<p>Specifically, :set_default_preemptible_scheduling_parameter would run <em>before</em> :set_requesting_container_uuid when it needs to run <em>after</em></p>
<ul>
<li>I don't understand what the test changes have to do with the callback ordering change</li>
<li>Seems like an opportunity to write the test that would have detected the mistake in the first place</li>
</ul> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=636912018-06-20T19:24:36ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Rebased and tried again: <a class="changeset" title="7478: Fixes default preemptible parameter when submitting committed child CRs The API server was..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/29e80f471f1d70d1d1eda43b05e0f2e059564509">29e80f471f1d70d1d1eda43b05e0f2e059564509</a><br />Test run: <a class="external" href="https://ci.curoverse.com/job/developer-run-tests/772/">https://ci.curoverse.com/job/developer-run-tests/772/</a></p>
As talked on chat, moved both <code>set_requesting_container_uuid</code> and <code>set_default_preemptible_scheduling_parameter</code> callbacks to run on <code>before_save</code>, adding an extra check on <code>set_requesting_container_uuid</code> to avoid reassigning the field so that both cases are taken into account:
<ul>
<li>Create CR, and later change state to Committed</li>
<li>Create CR with state=Committed</li>
</ul>
<p>Added test for the newly fixed case.</p> Arvados - Idea #7478: [Node Manager] Creates compute nodes using AWS spot instanceshttps://dev.arvados.org/issues/7478?journal_id=647152018-07-23T18:41:42ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Release</strong> set to <i>13</i></li></ul>