Project

General

Profile

Actions

Feature #21705

closed

Update Go runtime and dependencies

Added by Tom Clegg 7 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Story points:
-
Release:
Release relationship:
Auto

Files

cloudtest.log.txt (6.08 KB) cloudtest.log.txt Tom Clegg, 05/15/2024 02:23 PM

Subtasks 3 (0 open3 closed)

Task #21729: Review 21705-go-depsResolvedTom Clegg05/24/2024Actions
Task #21837: Review 21705-go-deps (part 3, 8b700795)ResolvedTom Clegg05/30/2024Actions
Task #21853: Review 21705-go-deps-update-allResolvedTom Clegg06/10/2024Actions
Actions #1

Updated by Tom Clegg 7 months ago

  • Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Actions #2

Updated by Tom Clegg 7 months ago

  • Status changed from New to In Progress
Actions #3

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-05-22 sprint
Actions #4

Updated by Tom Clegg 6 months ago

21705-go-deps @ 03c62749d21315d1a2020ca663646cff185b63c3 -- developer-run-tests: #4217

  • Migrate keepstore S3 driver, dispatchcloud EC2 driver, and keep-web S3 test case to modern AWS SDK (aws-sdk-go-v2)
  • Update hashicorp/go-retryablehttp with our PR merged, remove our workaround

I have started another branch (21705-go122) with dependency updates that require go >= 1.21. Currently CI images have go 1.20, but if we rebuild after merging this branch (21705-go-deps) they will have 1.22.

Go 1.21 introduces toolchain versioning, so this sort of upgrade should get easier in future.

Actions #5

Updated by Tom Clegg 6 months ago

arvados-server cloudtest result looks good: cloudtest.log.txt

The "cloud/driver Create succeeded, but instance is not in list" message is logged by the previous version too:

workbench.tordo:~# /tmp/arvados-server --version
/tmp/arvados-server 2.8.0~dev20240514203017 (go1.20.6)
workbench.tordo:~# /tmp/arvados-server cloudtest
...
ERRO[2024-05-15T14:24:31.031447498Z] cloud/driver Create succeeded, but instance is not in list  error="test instance missing from cloud provider's list" 
...

Evidently this is an "eventually consistent" situation that cloudtest doesn't expect. It might cause a bit of unnecessary churn in arvados-dispatch-cloud, but in the context of porting our EC2 driver to the new SDK it just indicates we have successfully preserved the existing driver's behavior.

Actions #7

Updated by Tom Clegg 6 months ago

When we added "get credentials from instance metadata" support to the original s3 driver in #15599 we documented the following configuration scheme:
  1. If AccessKeyID and SecretAccessKey are both blank, s3 driver uses IAM role
  2. If IAMRole is also blank, s3 driver uses whatever IAM role is provided in instance metadata
  3. If IAMRole is not blank, s3 driver uses IAM role in instance metadata only if it matches
  4. If AccessKeyID / SecretAccessKey and IAMRole are both non-blank, error out at startup because it's unclear which credentials you want to use

However, #3 and #4 were never implemented in the "s3aws" driver, which has since become the only s3 driver. The IAMRole config is just completely ignored.

For #3: I think it would be painful to implement and maintain this with the new code base. The aws sdk does lots of things under the hood so it's hard to interfere/second-guess correctly.

For #4: If we don't have #3, then #4 reduces to "set this to true to assure us that your blank access/secret keys are not an error/oversight", which seems a bit silly. I am inclined to remove the IAMRole config, so the behaviors are
  • set AccessKeyID and SecretAccessKey → use explicit credentials
  • don't set them → use instance metadata

Again, this is what the code has been doing since April 2023, so it would just be a documentation update.

Actions #8

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2024-05-22 sprint to Development 2024-06-05 sprint
Actions #9

Updated by Brett Smith 6 months ago

This all looks good to me. Thank you for the last commit especially, I was all set to ask if some direct configuration like this was possible and I'm glad I don't have to write all that up now.

Tom Clegg wrote in #note-7:

For #4: If we don't have #3, then #4 reduces to "set this to true to assure us that your blank access/secret keys are not an error/oversight", which seems a bit silly. I am inclined to remove the IAMRole config, so the behaviors are
  • set AccessKeyID and SecretAccessKey → use explicit credentials
  • don't set them → use instance metadata

Under the circumstances this seems reasonable enough. I guess the question I'm left asking is, was there some specific use case motivating the #3 "check that IAM role matches configuration" behavior? On the one hand, you can sorta kinda imagine it being a little defense in depth measure. On the other hand, I don't think I've seen literally any other software implement something like this. If an administrator actually wanted the check, that would be interesting, but I don't see anything like that in #15599. If it's something we wrote sorta just because we could, then I have absolutely no qualms about removing it.

I'm guessing the plan is to remove IAMRole from configuration entirely and document that in the upgrade notes?

Actions #10

Updated by Tom Clegg 6 months ago

Brett Smith wrote in #note-9:

was there some specific use case motivating the #3 "check that IAM role matches configuration" behavior?

No. As I recall, when I wrote the initial version of this feature, AWS docs indicated that a given instance cannot have multiple IAM roles, but the way the role/credentials metadata endpoint presented the credentials (GET http://169.254.169.254/latest/meta-data/iam/security-credentials/$rolename) seemed calculated to allow for a number of IAM roles/credentials to be available, so it seemed more future-proof to implement "default to choosing first/only one in list, but also provide a way for admins to specify, in case the 'only one' rule isn't permanent/universal" -- and, given that we have the feature, point out in the docs that it can be used as an extra config sanity check.

I'm guessing the plan is to remove IAMRole from configuration entirely and document that in the upgrade notes?

Yes, that's what I propose.

Actions #12

Updated by Brett Smith 6 months ago

Tom Clegg wrote in #note-11:

21705-go-deps @ 8b70079544cb19e7941de4bf141ccc053985edcc -- developer-run-tests: #4253

Looks good to me. My one tiny suggestion is it would be nice for clarity if the upgrade note spelled out the fully qualified name of the configuration setting the first time it appeared: "The Collections.Volumes.*.DriverParameters.IAMRole configuration entry for S3 volumes has been removed."

Is it worth adding something to our coding standards to talk about the use of %w and logger.WithError you did in 9aa54df377a9bc86d98f9dfe5f767551ecd4977e?

Actions #13

Updated by Tom Clegg 6 months ago

After updating the CI worker image, various container-running tests (mostly singularity) started failing. It appears we can't quite use Go 1.22 yet:

So...

21705-go121 @ 13ffe9bd040c4538269dd1cf12de496469d58895 -- developer-run-tests: #4258
  • go 1.21.10
Actions #14

Updated by Lucas Di Pentima 6 months ago

Testing new jenkins worker image: developer-run-tests: #4259

Actions #16

Updated by Tom Clegg 6 months ago

A bug in gorilla/mux v1.8.1 would cause controller to return 404 instead of 405 when a request's path matches a real route but its method does not. Practically, this probably wouldn't hurt anything, but I don't see any changes in v1.8.1 that have value for us, so for now I'm inclined to stay at v1.8.0 and give gorilla/mux time to figure it out, rather than change our tests/behavior back and forth.

21705-go-deps-update-all @ 45a7b5a0d3ba2817689d12e1002c7d154a537073 -- developer-run-tests: #4267

Actions #17

Updated by Brett Smith 6 months ago

Tom Clegg wrote in #note-16:

21705-go-deps-update-all @ 45a7b5a0d3ba2817689d12e1002c7d154a537073 -- developer-run-tests: #4267

Looks good to me, thanks.

Just out of curiosity, is the development process of updating all these modules documented somewhere? (Is it not because it's one obvious go command?) Are there limits to what updates it considers, like it won't automatically increase major version releases?

Actions #18

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2024-06-05 sprint to Development 2024-06-19 sprint
Actions #19

Updated by Tom Clegg 5 months ago

Good point. I've updated the very out-of-date Updating dependencies wiki with current cheat notes and a link to the relevant Go docs.

Actions #20

Updated by Tom Clegg 5 months ago

  • Status changed from In Progress to Resolved
Actions #21

Updated by Peter Amstutz 3 months ago

  • Release set to 70
Actions

Also available in: Atom PDF