Feature #14717

[websockets] Use cluster config

Added by Peter Amstutz 10 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
07/18/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Subtasks

Task #15394: Review 14717-ws-new-configResolvedPeter Amstutz


Related issues

Related to Arvados - Story #13648: [Epic] Use one cluster configuration file for all componentsNew

Associated revisions

Revision 2ecb1b1e (diff)
Added by Peter Amstutz 4 months ago

arvbox creates /var/lib/arvados/run_tests/config.yml

Set CONFIGSRC when running tests in arvbox

refs #14717

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

Revision a93774da
Added by Peter Amstutz 4 months ago

Merge branch '15467-config-fixups'

Also includes 14713-cds-new-config and 14717-ws-new-config

Updates run_test_server.py and run-tests.sh to use new config.yml

refs #14713
refs #14717
refs #15467

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Peter Amstutz 10 months ago

  • Status changed from New to In Progress

#2 Updated by Peter Amstutz 10 months ago

  • Status changed from In Progress to New

#3 Updated by Peter Amstutz 10 months ago

  • Related to Story #13648: [Epic] Use one cluster configuration file for all components added

#4 Updated by Tom Morris 10 months ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 1.0

#5 Updated by Tom Morris 5 months ago

  • Target version changed from Arvados Future Sprints to 2019-07-03 Sprint

#6 Updated by Peter Amstutz 5 months ago

  • Assigned To set to Peter Amstutz

#7 Updated by Peter Amstutz 5 months ago

  • Target version changed from 2019-07-03 Sprint to 2019-07-17 Sprint

#8 Updated by Peter Amstutz 4 months ago

  • Target version changed from 2019-07-17 Sprint to 2019-07-31 Sprint

#9 Updated by Peter Amstutz 4 months ago

14717-ws-new-config @ 7388e2f812805496e67930fdd316bb19657cb9fa

based on 14713-cds-new-config branch so that one probably needs to merge first.

#10 Updated by Tom Clegg 4 months ago

debug printf

+       fmt.Printf("Clllllllllllient %v %v", *oc.Client, cluster.Services.Controller.ExternalURL)

LoadDefaults() seems like a testing-only thing ("zzzzz"), and is only used by one test -- maybe we could relegate this to the test package and let the test(s) do something like config.NewLoader(bytes.NewBuffer(arvadostest.DefaultConfigFile), nil)?

Not sure about renaming PingTimeout to KeepaliveTimeout given "Keepalive" already has other meanings in both HTTP and TCP. SSH uses "server alive messages" and "client alive messages" to disambiguate. Even "ping timeout" isn't a great description -- we use it for two different things, 1) a ping interval and 2) a timeout for all sends/writes -- so perhaps
  • call it ClientAliveInterval and (potentially / in future) use it for SSH sessions, too? or
  • call it SendTimeout

nit: should WebsocketsPath just be WebsocketPath?

#11 Updated by Peter Amstutz 4 months ago

Tom Clegg wrote:

debug printf

[...]

Fixed.

LoadDefaults() seems like a testing-only thing ("zzzzz"), and is only used by one test -- maybe we could relegate this to the test package and let the test(s) do something like config.NewLoader(bytes.NewBuffer(arvadostest.DefaultConfigFile), nil)?

Removed. Added in a new behavior to make loading the main config optional (for backwards compatibility when only the component config is available) which has the side effect of providing a way to ask it for just the defaults.

Not sure about renaming PingTimeout to KeepaliveTimeout given "Keepalive" already has other meanings in both HTTP and TCP. SSH uses "server alive messages" and "client alive messages" to disambiguate. Even "ping timeout" isn't a great description -- we use it for two different things, 1) a ping interval and 2) a timeout for all sends/writes -- so perhaps
  • call it ClientAliveInterval and (potentially / in future) use it for SSH sessions, too? or
  • call it SendTimeout

Fixed.

nit: should WebsocketsPath just be WebsocketPath?

Fixed.

As noted above, revised loading behavior a bit to ensure that either/or/both the main config or component are available.

14717-ws-new-config @ b2df3cba90166bc67dd93600ac1a44ac6f9e81bd

https://ci.curoverse.com/job/developer-run-tests/1409

#12 Updated by Tom Clegg 4 months ago

Is this "/etc/arvados/config.yml is optional if it doesn't exist and a per-component config was loaded" stuff really necessary? Especially since 1.4 is already telling everyone to move to /etc/arvados/config.yml, I'm thinking we can just write "make sure it exists" in the upgrading docs instead. I get the idea of surviving an upgrade even if you haven't read the docs, but I'm not sure it's worth the added special cases -- e.g., on a new system with no configs at all, it looks like c-d-s will error out with "legacy config file doesn't exist" which seems to point in the wrong direction; all code used by legacy components needs to avoid relying on the configured cluster ID because it might be the fake value "zzzzz"; "config-dump" still won't be able to tell you what config c-d-s is using until you create a [empty] cluster config file; etc.

I think we're better off making config loading work the same way regardless of who's asking (although there's already the exception that if you're using custom config file locations you need to use config-dump -legacy-* in order to match your some-component -config results if you're using alternate config file locations).

#13 Updated by Peter Amstutz 4 months ago

Tom Clegg wrote:

Is this "/etc/arvados/config.yml is optional if it doesn't exist and a per-component config was loaded" stuff really necessary? Especially since 1.4 is already telling everyone to move to /etc/arvados/config.yml, I'm thinking we can just write "make sure it exists" in the upgrading docs instead. I get the idea of surviving an upgrade even if you haven't read the docs, but I'm not sure it's worth the added special cases -- e.g., on a new system with no configs at all, it looks like c-d-s will error out with "legacy config file doesn't exist" which seems to point in the wrong direction; all code used by legacy components needs to avoid relying on the configured cluster ID because it might be the fake value "zzzzz"; "config-dump" still won't be able to tell you what config c-d-s is using until you create a [empty] cluster config file; etc.

Ok, LegacyComponentConfig is gone. I was mostly trying to get run-tests.sh to work on jenkins in a backwards-compatible way without requiring $CONFIGSRC/config.yml but since we're going all in on config.yml anyway, I worked with Fernando and we did this instead: https://dev.arvados.org/issues/15492

14717-ws-new-config @ 26c517777afbdb9668ab24aa61bfc44ecd7f26b9

https://ci.curoverse.com/job/developer-run-tests/1416/

#14 Updated by Tom Clegg 4 months ago

Peter Amstutz wrote:

I was mostly trying to get run-tests.sh to work on jenkins in a backwards-compatible way without requiring $CONFIGSRC/config.yml but since we're going all in on config.yml anyway, I worked with Fernando and we did this instead: https://dev.arvados.org/issues/15492

Ah. Maybe this is a tangent now, but... I see run-tests.sh also has some code to write its own config.yml. We also still have $temp/arvados.yml, written by run_controller() in run_test_server.py. Maybe we could simplify all this by just setting ARVADOS_CONFIG to $temp/arvados.yml the same way we set ARVADOS_API_TOKEN etc., so (except where we override it) everything uses the same config during tests?

#15 Updated by Peter Amstutz 4 months ago

Tom Clegg wrote:

Peter Amstutz wrote:

I was mostly trying to get run-tests.sh to work on jenkins in a backwards-compatible way without requiring $CONFIGSRC/config.yml but since we're going all in on config.yml anyway, I worked with Fernando and we did this instead: https://dev.arvados.org/issues/15492

Ah. Maybe this is a tangent now, but... I see run-tests.sh also has some code to write its own config.yml. We also still have $temp/arvados.yml, written by run_controller() in run_test_server.py. Maybe we could simplify all this by just setting ARVADOS_CONFIG to $temp/arvados.yml the same way we set ARVADOS_API_TOKEN etc., so (except where we override it) everything uses the same config during tests?

Alright, this required a bit of refactoring, but now run_test_server.py has a "setup_config" option which assigns ports for all the components, writes a config file, and exports ARVADOS_CONFIG to be used for the remainder of the test run. Various code involving endpoints now looks at Services.*.InternalURLs/ExternalURL in arvados.yml. I removed the code for reading/writing *.port files and various other bits that are obsoleted by the new config.

It's possible there are a few loose ends, I can't test on jenkins at the moment.

14717-ws-new-config @ 84058ef2f2142a2ced249d2bc02e8556ae268c6e

#16 Updated by Peter Amstutz 4 months ago

rebased 14717-ws-new-config @ 59a2d537f3450407aa48e32645d92a5246c046fe

#17 Updated by Peter Amstutz 4 months ago

14717-ws-new-config @ f89b45c97b047298c86dd58d5da8c07aa3d7d27e

Passed tests (finally):

https://ci.curoverse.com/view/Developer/job/developer-run-tests/1430/

Just waiting for comments / sign off.

#19 Updated by Tom Clegg 4 months ago

Peter Amstutz wrote:

based on 14713-cds-new-config branch so that one probably needs to merge first.

Is this still true? (Meanwhile I should be reviewing the diff 40e3249b1...639789ffd, right?)

#20 Updated by Peter Amstutz 4 months ago

Tom Clegg wrote:

Peter Amstutz wrote:

based on 14713-cds-new-config branch so that one probably needs to merge first.

Is this still true? (Meanwhile I should be reviewing the diff 40e3249b1...639789ffd, right?)

No, I don't think the tests were passing on jenkins on the 14713 branch, that's how all the run_test_server.py and run-tests.sh wrangling started. Unless you think I need to start doing branch surgery to reorganize git history, at this point it I think it makes more sense to just merge 14717 which also has everything from 14713 in it.

#21 Updated by Tom Clegg 4 months ago

Is there a reason for having ws and c-d-s use gopkg.in/yaml.v2 instead of our usual ghodss/yaml, or is this just goimports guessing wrong?

run_test_server.py
  • Is there a use case for using different cluster IDs (other than zzzzz) in $CONFIGSRC/config.yml? It seems like we could just say ...["Clusters"]["zzzzz"]["PostgreSQL"]... instead of list(...["Clusters"].values())[0]["PostgreSQL"] + automatically munging a non-testy-looking database name.
  • Along similar lines, is /etc/arvados/config.yml ever a reasonable place to put a test suite config file? Or could we simplify this by making CONFIGSRC mandatory? (Looks like we should update the description in run-tests.sh either way)
  • Curious, is there a reason for building the services part of the config 2 different ways (a big literal with ExternalURL, then some assignments to InternalURLs)?
  • This looks like a debug printf to remove:
    +    print(ex, file=sys.stderr)
In services/ws/server_test.go
  • TestLoadLegacyConfig should use c.Error() instead of log.Fatal() ... or even c.Assert(err, check.IsNil)

#22 Updated by Peter Amstutz 4 months ago

Tom Clegg wrote:

Is there a reason for having ws and c-d-s use gopkg.in/yaml.v2 instead of our usual ghodss/yaml, or is this just goimports guessing wrong?

Yea that sounds like goimports or a cut and paste mistake, I didn't choose which yaml library to use intentionally.

run_test_server.py
  • Is there a use case for using different cluster IDs (other than zzzzz) in $CONFIGSRC/config.yml? It seems like we could just say ...["Clusters"]["zzzzz"]["PostgreSQL"]... instead of list(...["Clusters"].values())[0]["PostgreSQL"] + automatically munging a non-testy-looking database name.
  • Along similar lines, is /etc/arvados/config.yml ever a reasonable place to put a test suite config file? Or could we simplify this by making CONFIGSRC mandatory? (Looks like we should update the description in run-tests.sh either way)

This behavior is there because the easiest place to get the postgres config in the arvbox environment is the /etc/arvados/config.yml that is already there. But I can make arvbox write a separate testing-only config file and point CONFIGSRC at it.

Alternately, /etc/arvados/config.yml could have two cluster definitions, one regular one, and one 'zzzzz'. But right now, I suspect lots of things would get confused because various things are relying on the assumption that there's exactly one item in the Clusters map and that key is the cluster id to use. How did you envision it working to tell components which cluster config to use when there's more than one?

  • Curious, is there a reason for building the services part of the config 2 different ways (a big literal with ExternalURL, then some assignments to InternalURLs)?

I was under the impression that you couldn't have expressions on the left hand side of dict entries in Python, but actually that seems to be a limitation of Javascript and/or Ruby, not Python. (ETOOMANYLANGUAGES)

  • This looks like a debug printf to remove:
    [...]

I left that in on purpose because it seemed to be useful to know, but I can take it out.

In services/ws/server_test.go
  • TestLoadLegacyConfig should use c.Error() instead of log.Fatal() ... or even c.Assert(err, check.IsNil)

Will fix.

#23 Updated by Tom Clegg 4 months ago

Peter Amstutz wrote:

This behavior is there because the easiest place to get the postgres config in the arvbox environment is the /etc/arvados/config.yml that is already there. But I can make arvbox write a separate testing-only config file and point CONFIGSRC at it.

Aha. Yes, I think if arvbox has an implicit "use a named cluster's config but change dbname/id to test values" thing then it's better to keep that in arvbox, and keep run-tests simple ("here are the actual values to use").

Alternately, /etc/arvados/config.yml could have two cluster definitions, one regular one, and one 'zzzzz'. But right now, I suspect lots of things would get confused because various things are relying on the assumption that there's exactly one item in the Clusters map and that key is the cluster id to use. How did you envision it working to tell components which cluster config to use when there's more than one?

Yes, the idea is something along the lines of "set ARVADOS_CLUSTER_ID to abcde" so GetCluster() can choose the right one instead of requiring that only one is mentioned in the config file ... but I don't think we should tackle that now.

#24 Updated by Peter Amstutz 4 months ago

Tom Clegg wrote:

Is there a reason for having ws and c-d-s use gopkg.in/yaml.v2 instead of our usual ghodss/yaml, or is this just goimports guessing wrong?

Fixed.

run_test_server.py
  • Is there a use case for using different cluster IDs (other than zzzzz) in $CONFIGSRC/config.yml? It seems like we could just say ...["Clusters"]["zzzzz"]["PostgreSQL"]... instead of list(...["Clusters"].values())[0]["PostgreSQL"] + automatically munging a non-testy-looking database name.
  • Along similar lines, is /etc/arvados/config.yml ever a reasonable place to put a test suite config file? Or could we simplify this by making CONFIGSRC mandatory? (Looks like we should update the description in run-tests.sh either way)

Done.

  • Curious, is there a reason for building the services part of the config 2 different ways (a big literal with ExternalURL, then some assignments to InternalURLs)?

Done.

  • This looks like a debug printf to remove:
    [...]

Fixed.

In services/ws/server_test.go
  • TestLoadLegacyConfig should use c.Error() instead of log.Fatal() ... or even c.Assert(err, check.IsNil)

Fixed.

14717-ws-new-config @ 3596bdedbf0a592b3dd4bdcf589c3de7b8913ee1

https://ci.curoverse.com/view/Developer/job/developer-run-tests/1436/

#25 Updated by Tom Clegg 4 months ago

Is there an advantage to preserving an existing setup_config result in start_services() rather than creating a new one? The condition seems to create a variant of the previous bug: interactive mode will generate a config once after this is merged, but never update it when the code changes. (Ideally "rake npm:install" wouldn't be so demanding about keep-web config in the first place, but ... ugh)

"CONFIGSRC environment variable not set to a config file" is a bit of a misdirection.

Rest LGTM

#26 Updated by Peter Amstutz 4 months ago

Tom Clegg wrote:

Is there an advantage to preserving an existing setup_config result in start_services() rather than creating a new one? The condition seems to create a variant of the previous bug: interactive mode will generate a config once after this is merged, but never update it when the code changes. (Ideally "rake npm:install" wouldn't be so demanding about keep-web config in the first place, but ... ugh)

It will generate a config with new port numbers but none of the services should be running so I don't think it will mess anything up. I removed the conditional.

"CONFIGSRC environment variable not set to a config file" is a bit of a misdirection.

Made the validations & error messages more specific.

Rest LGTM

Thanks.

#27 Updated by Peter Amstutz 4 months ago

  • Status changed from New to Resolved

Also available in: Atom PDF