Project

General

Profile

Actions

Feature #22602

closed

arv-copy should recognize when default environment matches instance name

Added by Peter Amstutz about 1 month ago. Updated 20 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
-
Release relationship:
Auto

Description

If the source or destination has the cluster ID matching the default environment, it should use that, instead of using xxxxx.conf, which may not exist -- in which case it fails even though the correct credentials were available in the default environment.


Subtasks 1 (0 open1 closed)

Task #22606: Review 22602-arv-copyResolvedPeter Amstutz03/16/2025Actions

Related issues 1 (1 open0 closed)

Related to Arvados - Idea #22665: Generalize logic for getting cluster credentialsNewActions
Actions #1

Updated by Peter Amstutz about 1 month ago

  • Position changed from -941372 to -941364
Actions #2

Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz about 1 month ago

  • Subtask #22606 added
Actions #5

Updated by Peter Amstutz about 1 month ago

22602-arv-copy @ fa46c8ee04dac02150044e5ef7228ea75a1370ea

developer-run-tests: #4674

  • All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
    • Now, when the desired cluster id is known, it will check if the cluster id of the default configuration matches (environment or settings.conf) before checking for cluster-specific config files.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • n/a
  • Code is tested and passing, both automated and manual, what manual testing was done is described.
    • Updated unit tests and also confirmed that my CWL wrapper (see discussion below) worked with this change.
  • New or changed UX/UX and has gotten feedback from stakeholders.
    • n/a
  • Documentation has been updated.
    • yes (see discussion below)
  • Behaves appropriately at the intended scale (describe intended scale).
    • No changes in scale. Could produce bad behavior if default credentials are present but don't work in a weird way (e.g. a bad DNS lookup which produces a spiral of fail-wait-retry) but users will likely experience other problems if that's the case.
  • Considered backwards and forwards compatibility issues between client and server.
    • no compatibility issues
  • Follows our coding standards and GUI style guidelines.
    • yes

In addition to being just a usability win, the specific motivation came up in trying to wrap arv-copy as a CWL CommandLineTool. I needed to have the local cluster's credentials provided by the environment (because that's what API: true does) and the remote cluster's credentials written to a config file.

I switched some UUID references from "jutro" to "pirca". I think the
original reason to use jutro is because pirca was supposed to be
cleaned up periodically (which hasn't happened) and they are
federated. However, because we are not talking about federation in
this section, this is mostly just confusing.

There are still references to jutro under "copy a workflow" and "copy
a project" and I'm leaving them alone for now.

Actions #6

Updated by Peter Amstutz about 1 month ago

  • Release set to 75
Actions #7

Updated by Peter Amstutz about 1 month ago

  • Status changed from New to In Progress
Actions #8

Updated by Brett Smith about 1 month ago

I'm wondering: would it meet the functional need if we checked configuration files the other way around? i.e., first check zzzzz.conf, then if that doesn't work (for some definition of "doesn't work"), check if settings.conf has credentials for zzzzz?

  • In general, I feel like Unix tools follow the principle of least surprise if more specific configuration has precedence over less specific configuration.
  • In particular, I think this setup is less likely to make surprising results and avoid backwards compatibility problems for users who already have both files configured.
  • I'm also concerned about what happens if the settings.conf credentials don't work for any reason. In the current code I think there may be some cases where the credentials in settings.conf fail and raise an unhandled exception when zzzzz.conf might've worked just fine. That problem becomes less pressing if we load in the other order.
Actions #9

Updated by Brett Smith about 1 month ago

  • Target version changed from Development 2025-02-26 to Development 2025-03-19
Actions #10

Updated by Peter Amstutz 25 days ago

Brett Smith wrote in #note-8:

I'm wondering: would it meet the functional need if we checked configuration files the other way around? i.e., first check zzzzz.conf, then if that doesn't work (for some definition of "doesn't work"), check if settings.conf has credentials for zzzzz?

  • In general, I feel like Unix tools follow the principle of least surprise if more specific configuration has precedence over less specific configuration.
  • In particular, I think this setup is less likely to make surprising results and avoid backwards compatibility problems for users who already have both files configured.
  • I'm also concerned about what happens if the settings.conf credentials don't work for any reason. In the current code I think there may be some cases where the credentials in settings.conf fail and raise an unhandled exception when zzzzz.conf might've worked just fine. That problem becomes less pressing if we load in the other order.

Yes, this makes sense. It now tries the config file first, then the default credentials, and only then errors out if it can't find valid credentials.

22602-arv-copy @ d9214cd1bbb7374b54b13b05b48607b065356cf6

developer-run-tests: #4685

Actions #11

Updated by Brett Smith 23 days ago

Peter Amstutz wrote in #note-10:

22602-arv-copy @ d9214cd1bbb7374b54b13b05b48607b065356cf6

  • When we construct arvados.api('v1'), it can raise httplib2.errors.HttpLib2Error if it can't connect to the server, or googleapiclient.errors.Error if the discovery document can't be found, is malformed, etc. It would be nice to catch these and have the specific problem reflected in the abort error message back to the user.
  • In abort("Have credentials for {} but need to connect to {}"), I believe we can get here if the function was called with an instance_name but failed to construct a default client. This will produce the error message "Have credentials for None but need to connect to zzzzz," which seems a little confusing. It would be nice to clean this up.

Thanks.

Actions #12

Updated by Peter Amstutz 23 days ago

Brett Smith wrote in #note-11:

Peter Amstutz wrote in #note-10:

22602-arv-copy @ d9214cd1bbb7374b54b13b05b48607b065356cf6

  • When we construct arvados.api('v1'), it can raise httplib2.errors.HttpLib2Error if it can't connect to the server, or googleapiclient.errors.Error if the discovery document can't be found, is malformed, etc. It would be nice to catch these and have the specific problem reflected in the abort error message back to the user.

Added more error handling. In particular, I had to enable the googleapiclient logging stream so that retries would show up instead of arv-copy appearing to hang. Also added some handling for network-related OSError.

  • In abort("Have credentials for {} but need to connect to {}"), I believe we can get here if the function was called with an instance_name but failed to construct a default client. This will produce the error message "Have credentials for None but need to connect to zzzzz," which seems a little confusing. It would be nice to clean this up.

That's right, I added a check that default_instance is valid, otherwise it falls back to the generic error message.

22602-arv-copy @ 948501aa712ec5cd2e5209ac7a271bee954c94d7

developer-run-tests: #4695

Actions #13

Updated by Brett Smith 22 days ago

Peter Amstutz wrote in #note-12:

22602-arv-copy @ 948501aa712ec5cd2e5209ac7a271bee954c94d7

  • The connection error message for default_instance never gets used if instance_name is not set, because it'll just fall back to the default abort message at the bottom of the function. In general it feels like this code is juggling an awful lot of state to try to decide which error message is most important to show to the user. Would it maybe be simpler to track separate instance_errmsg and default_errmsg variables, and abort(instance_errmsg or default_errmsg) at the end? Or maybe keep them in a list and abort('; '.join(errmsgs)) or whatever.
  • When we set up the googleapiclient logger, it would be nicer to attach arvados.logging.log_handler instead of constructing a new StreamHandler. This will keep the rendering of the log messages more consistent across sources.
  • Just a small typo in this comment: # Once we're successfully contacted the clusters, "we're"→"we've"

Thanks.

Actions #14

Updated by Peter Amstutz 22 days ago

Brett Smith wrote in #note-13:

Peter Amstutz wrote in #note-12:

22602-arv-copy @ 948501aa712ec5cd2e5209ac7a271bee954c94d7

  • The connection error message for default_instance never gets used if instance_name is not set, because it'll just fall back to the default abort message at the bottom of the function. In general it feels like this code is juggling an awful lot of state to try to decide which error message is most important to show to the user. Would it maybe be simpler to track separate instance_errmsg and default_errmsg variables, and abort(instance_errmsg or default_errmsg) at the end? Or maybe keep them in a list and abort('; '.join(errmsgs)) or whatever.

Yea I ended up turning msg into a list and then logging all the errors. I think that is a bit more helpful for debugging.

  • When we set up the googleapiclient logger, it would be nicer to attach arvados.logging.log_handler instead of constructing a new StreamHandler. This will keep the rendering of the log messages more consistent across sources.

Good idea.

  • Just a small typo in this comment: # Once we're successfully contacted the clusters, "we're"→"we've"

Fixed.

22602-arv-copy @ 858f3b2c3138600531961692c8378ce795b448d6

developer-run-tests: #4696

Actions #15

Updated by Peter Amstutz 22 days ago

  • Related to Idea #22665: Generalize logic for getting cluster credentials added
Actions #16

Updated by Brett Smith 22 days ago

Peter Amstutz wrote in #note-14:

22602-arv-copy @ 858f3b2c3138600531961692c8378ce795b448d6

  • The assignment dirs = [] is functionally a noop. Later when we use dirs we call dirs.search_paths(), so all this does is change a possible NameError into a possible AttributeError. I think this is more likely to mislead than clarify and would be better removed.
  • Please DRY up BaseDirectories by having search call search_paths.
  • The new search_paths method should have tests too.

Thanks.

Actions #17

Updated by Peter Amstutz 22 days ago

Brett Smith wrote in #note-16:

Peter Amstutz wrote in #note-14:

22602-arv-copy @ 858f3b2c3138600531961692c8378ce795b448d6

  • The assignment dirs = [] is functionally a noop. Later when we use dirs we call dirs.search_paths(), so all this does is change a possible NameError into a possible AttributeError. I think this is more likely to mislead than clarify and would be better removed.
  • Please DRY up BaseDirectories by having search call search_paths.
  • The new search_paths method should have tests too.

Thanks.

Brett Smith wrote in #note-16:

Peter Amstutz wrote in #note-14:

22602-arv-copy @ 858f3b2c3138600531961692c8378ce795b448d6

  • The assignment dirs = [] is functionally a noop. Later when we use dirs we call dirs.search_paths(), so all this does is change a possible NameError into a possible AttributeError. I think this is more likely to mislead than clarify and would be better removed.

Yea, I initially didn't look closely enough made an incorrect assumption about the type of dirs (I thought it was just a list or iterable, not an object) which is why I gave it a default value of empty list (I have been burned when some code paths don't assign a variable that gets checked later). But in this case I think I convinced myself that if config_file to be falsy then dirs will always be set.

  • Please DRY up BaseDirectories by having search call search_paths.

Fixed.

  • The new search_paths method should have tests too.

Added a test.

22602-arv-copy @ 4dba476f839d31de421335208ff4f921cbb5a81c

developer-run-tests: #4704

Actions #18

Updated by Brett Smith 21 days ago

Peter Amstutz wrote in #note-17:

Yea, I initially didn't look closely enough made an incorrect assumption about the type of dirs (I thought it was just a list or iterable, not an object) which is why I gave it a default value of empty list (I have been burned when some code paths don't assign a variable that gets checked later).

Yeah, we're on the same page. In principle I don't have a problem with safeguard assignments, but in order to do their job they need to be type-compatible, and this wasn't.

22602-arv-copy @ 4dba476f839d31de421335208ff4f921cbb5a81c

LGTM, thanks.

Actions #19

Updated by Peter Amstutz 20 days ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF