Project

General

Profile

Actions

Bug #20649

closed

Improve troubleshooting assistance for compute instance SSH problems

Added by Tom Clegg 11 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
1.0
Release relationship:
Auto

Description

Any number of setup mistakes/problems can result in arvados-dispatch-cloud being unable to authenticate to a new cloud VM after successfully creating it. Currently, when this is happening, arvados does not give good debugging clues.

Specific improvements that would help:
  • arvados-server cloudtest should obey DeployPublicKey flag, so that it does the same thing as a-d-c.
  • SSH authentication errors should be logged right away (most boot probe failures are "can't connect to SSH port" or "probe command failed because system is still booting" and lots of these are expected in normal operation, which is why they are suppressed until boot timeout -- but an SSH authentication problem is not expected in normal operation)
  • When timing out on boot probe, a-d-c should log the last error, not just the stderr from the last attempt (which is empty in this case)
  • When timing out on boot probe, a-d-c's log message should remind the operator that "arvados-server cloudtest" is available to help troubleshoot.

Subtasks 1 (0 open1 closed)

Task #20815: Review 20649-ssh-helpResolvedTom Clegg08/14/2023Actions
Actions

Also available in: Atom PDF