[Node Manager] Can erroneously pair cloud nodes with stale Arvados node records
Node Manager pairs cloud nodes with Arvados node records based solely on an IP address match. See arvnodeman.computenode.dispatch.ComputeNodeMonitorActor.offer_arvados_pair.
It can happen that a cloud node comes up with an IP address that happens to match a stale Arvados node record. Make the testing stricter so there's no pairing in this case.
#4 Updated by Brett Smith almost 6 years ago
I think there are basically two possible approaches:
- EC2 compute nodes, at least, put their EC2 id in the Arvados node record's info. If we check against that, we can't go wrong—but it has the downside of meaning we have to reimplement this check for every cloud driver.
- Check that the Arvados node's first_ping_at is greater than the cloud node's boot time before accepting a pairing. This is completely generic, and very safe, although it could still go wrong if the total garbage data is getting into the node records.
I think I prefer #2, but I wanted to note the alternatives at least.