Feature #22602
closedarv-copy should recognize when default environment matches instance name
Description
If the source or destination has the cluster ID matching the default environment, it should use that, instead of using xxxxx.conf
, which may not exist -- in which case it fails even though the correct credentials were available in the default environment.
Updated by Peter Amstutz about 1 month ago
- Position changed from -941372 to -941364
Updated by Peter Amstutz about 1 month ago
22602-arv-copy @ ff056df90ca530e8d6307e7a7e284984abfc73aa
Updated by Peter Amstutz about 1 month ago
22602-arv-copy @ fa46c8ee04dac02150044e5ef7228ea75a1370ea
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- Now, when the desired cluster id is known, it will check if the cluster id of the default configuration matches (environment or settings.conf) before checking for cluster-specific config files.
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- n/a
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- Updated unit tests and also confirmed that my CWL wrapper (see discussion below) worked with this change.
- New or changed UX/UX and has gotten feedback from stakeholders.
- n/a
- Documentation has been updated.
- yes (see discussion below)
- Behaves appropriately at the intended scale (describe intended scale).
- No changes in scale. Could produce bad behavior if default credentials are present but don't work in a weird way (e.g. a bad DNS lookup which produces a spiral of fail-wait-retry) but users will likely experience other problems if that's the case.
- Considered backwards and forwards compatibility issues between client and server.
- no compatibility issues
- Follows our coding standards and GUI style guidelines.
- yes
In addition to being just a usability win, the specific motivation came up in trying to wrap arv-copy as a CWL CommandLineTool. I needed to have the local cluster's credentials provided by the environment (because that's what API: true
does) and the remote cluster's credentials written to a config file.
I switched some UUID references from "jutro" to "pirca". I think the
original reason to use jutro is because pirca was supposed to be
cleaned up periodically (which hasn't happened) and they are
federated. However, because we are not talking about federation in
this section, this is mostly just confusing.
There are still references to jutro under "copy a workflow" and "copy
a project" and I'm leaving them alone for now.
Updated by Peter Amstutz about 1 month ago
- Status changed from New to In Progress
Updated by Brett Smith about 1 month ago
I'm wondering: would it meet the functional need if we checked configuration files the other way around? i.e., first check zzzzz.conf
, then if that doesn't work (for some definition of "doesn't work"), check if settings.conf
has credentials for zzzzz
?
- In general, I feel like Unix tools follow the principle of least surprise if more specific configuration has precedence over less specific configuration.
- In particular, I think this setup is less likely to make surprising results and avoid backwards compatibility problems for users who already have both files configured.
- I'm also concerned about what happens if the
settings.conf
credentials don't work for any reason. In the current code I think there may be some cases where the credentials insettings.conf
fail and raise an unhandled exception whenzzzzz.conf
might've worked just fine. That problem becomes less pressing if we load in the other order.
Updated by Brett Smith about 1 month ago
- Target version changed from Development 2025-02-26 to Development 2025-03-19
Updated by Peter Amstutz 25 days ago
Brett Smith wrote in #note-8:
I'm wondering: would it meet the functional need if we checked configuration files the other way around? i.e., first check
zzzzz.conf
, then if that doesn't work (for some definition of "doesn't work"), check ifsettings.conf
has credentials forzzzzz
?
- In general, I feel like Unix tools follow the principle of least surprise if more specific configuration has precedence over less specific configuration.
- In particular, I think this setup is less likely to make surprising results and avoid backwards compatibility problems for users who already have both files configured.
- I'm also concerned about what happens if the
settings.conf
credentials don't work for any reason. In the current code I think there may be some cases where the credentials insettings.conf
fail and raise an unhandled exception whenzzzzz.conf
might've worked just fine. That problem becomes less pressing if we load in the other order.
Yes, this makes sense. It now tries the config file first, then the default credentials, and only then errors out if it can't find valid credentials.
22602-arv-copy @ d9214cd1bbb7374b54b13b05b48607b065356cf6
Updated by Brett Smith 23 days ago
Peter Amstutz wrote in #note-10:
22602-arv-copy @ d9214cd1bbb7374b54b13b05b48607b065356cf6
- When we construct
arvados.api('v1')
, it can raisehttplib2.errors.HttpLib2Error
if it can't connect to the server, orgoogleapiclient.errors.Error
if the discovery document can't be found, is malformed, etc. It would be nice to catch these and have the specific problem reflected in the abort error message back to the user. - In
abort("Have credentials for {} but need to connect to {}")
, I believe we can get here if the function was called with aninstance_name
but failed to construct a default client. This will produce the error message "Have credentials for None but need to connect to zzzzz," which seems a little confusing. It would be nice to clean this up.
Thanks.
Updated by Peter Amstutz 23 days ago
Brett Smith wrote in #note-11:
Peter Amstutz wrote in #note-10:
22602-arv-copy @ d9214cd1bbb7374b54b13b05b48607b065356cf6
- When we construct
arvados.api('v1')
, it can raisehttplib2.errors.HttpLib2Error
if it can't connect to the server, orgoogleapiclient.errors.Error
if the discovery document can't be found, is malformed, etc. It would be nice to catch these and have the specific problem reflected in the abort error message back to the user.
Added more error handling. In particular, I had to enable the googleapiclient logging stream so that retries would show up instead of arv-copy appearing to hang. Also added some handling for network-related OSError.
- In
abort("Have credentials for {} but need to connect to {}")
, I believe we can get here if the function was called with aninstance_name
but failed to construct a default client. This will produce the error message "Have credentials for None but need to connect to zzzzz," which seems a little confusing. It would be nice to clean this up.
That's right, I added a check that default_instance is valid, otherwise it falls back to the generic error message.
22602-arv-copy @ 948501aa712ec5cd2e5209ac7a271bee954c94d7
Updated by Brett Smith 22 days ago
Peter Amstutz wrote in #note-12:
22602-arv-copy @ 948501aa712ec5cd2e5209ac7a271bee954c94d7
- The connection error message for
default_instance
never gets used ifinstance_name
is not set, because it'll just fall back to the defaultabort
message at the bottom of the function. In general it feels like this code is juggling an awful lot of state to try to decide which error message is most important to show to the user. Would it maybe be simpler to track separateinstance_errmsg
anddefault_errmsg
variables, andabort(instance_errmsg or default_errmsg)
at the end? Or maybe keep them in a list andabort('; '.join(errmsgs))
or whatever. - When we set up the googleapiclient logger, it would be nicer to attach
arvados.logging.log_handler
instead of constructing a newStreamHandler
. This will keep the rendering of the log messages more consistent across sources. - Just a small typo in this comment:
# Once we're successfully contacted the clusters
, "we're"→"we've"
Thanks.
Updated by Peter Amstutz 22 days ago
Brett Smith wrote in #note-13:
Peter Amstutz wrote in #note-12:
22602-arv-copy @ 948501aa712ec5cd2e5209ac7a271bee954c94d7
- The connection error message for
default_instance
never gets used ifinstance_name
is not set, because it'll just fall back to the defaultabort
message at the bottom of the function. In general it feels like this code is juggling an awful lot of state to try to decide which error message is most important to show to the user. Would it maybe be simpler to track separateinstance_errmsg
anddefault_errmsg
variables, andabort(instance_errmsg or default_errmsg)
at the end? Or maybe keep them in a list andabort('; '.join(errmsgs))
or whatever.
Yea I ended up turning msg
into a list and then logging all the errors. I think that is a bit more helpful for debugging.
- When we set up the googleapiclient logger, it would be nicer to attach
arvados.logging.log_handler
instead of constructing a newStreamHandler
. This will keep the rendering of the log messages more consistent across sources.
Good idea.
- Just a small typo in this comment:
# Once we're successfully contacted the clusters
, "we're"→"we've"
Fixed.
22602-arv-copy @ 858f3b2c3138600531961692c8378ce795b448d6
Updated by Peter Amstutz 22 days ago
- Related to Idea #22665: Generalize logic for getting cluster credentials added
Updated by Brett Smith 22 days ago
Peter Amstutz wrote in #note-14:
22602-arv-copy @ 858f3b2c3138600531961692c8378ce795b448d6
- The assignment
dirs = []
is functionally a noop. Later when we use dirs we calldirs.search_paths()
, so all this does is change a possibleNameError
into a possibleAttributeError
. I think this is more likely to mislead than clarify and would be better removed. - Please DRY up
BaseDirectories
by havingsearch
callsearch_paths
. - The new
search_paths
method should have tests too.
Thanks.
Updated by Peter Amstutz 22 days ago
Brett Smith wrote in #note-16:
Peter Amstutz wrote in #note-14:
22602-arv-copy @ 858f3b2c3138600531961692c8378ce795b448d6
- The assignment
dirs = []
is functionally a noop. Later when we use dirs we calldirs.search_paths()
, so all this does is change a possibleNameError
into a possibleAttributeError
. I think this is more likely to mislead than clarify and would be better removed.- Please DRY up
BaseDirectories
by havingsearch
callsearch_paths
.- The new
search_paths
method should have tests too.Thanks.
Brett Smith wrote in #note-16:
Peter Amstutz wrote in #note-14:
22602-arv-copy @ 858f3b2c3138600531961692c8378ce795b448d6
- The assignment
dirs = []
is functionally a noop. Later when we use dirs we calldirs.search_paths()
, so all this does is change a possibleNameError
into a possibleAttributeError
. I think this is more likely to mislead than clarify and would be better removed.
Yea, I initially didn't look closely enough made an incorrect assumption about the type of dirs
(I thought it was just a list or iterable, not an object) which is why I gave it a default value of empty list (I have been burned when some code paths don't assign a variable that gets checked later). But in this case I think I convinced myself that if config_file
to be falsy then dirs
will always be set.
- Please DRY up
BaseDirectories
by havingsearch
callsearch_paths
.
Fixed.
- The new
search_paths
method should have tests too.
Added a test.
22602-arv-copy @ 4dba476f839d31de421335208ff4f921cbb5a81c
Updated by Brett Smith 21 days ago
Peter Amstutz wrote in #note-17:
Yea, I initially didn't look closely enough made an incorrect assumption about the type of
dirs
(I thought it was just a list or iterable, not an object) which is why I gave it a default value of empty list (I have been burned when some code paths don't assign a variable that gets checked later).
Yeah, we're on the same page. In principle I don't have a problem with safeguard assignments, but in order to do their job they need to be type-compatible, and this wasn't.
22602-arv-copy @ 4dba476f839d31de421335208ff4f921cbb5a81c
LGTM, thanks.
Updated by Peter Amstutz 20 days ago
- Status changed from In Progress to Resolved