Story #15954

[boot] Bring up test cluster using provided config file and source tree

Added by Tom Clegg 9 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
02/28/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
4.0

Description

The following commands should bring up a functioning test cluster:

git clone https://github.com/arvados/arvados
go run arvados/cmd/arvados-server boot -source-tree ./arvados -config ./arvados/doc/examples/config/zzzzz.yml -temp-dir {...} -test-fixtures=true
Assuming the user has taken care of these prerequisites:
  • PostgreSQL, Ruby, Ruby gems/bundle, Python, nginx, etc. are installed
The resulting cluster should:
  • Use a new temporary directory, and delete it when exiting
  • Use any local/uncommitted changes in the ./arvados work tree
  • Use {temp dir}/keep/ as a keep volume backend
  • Use {temp dir}/ for any pid/lock/temp files
  • Have no sso, workbench2, or composer
The "boot" process should:
  • Stay in the foreground
  • Log to stderr (OK for the time being if some logs go to {provided temp dir}/ instead)
  • Exit (and shut down any child processes) when SIGINT or SIGTERM is received or a child service/component fails

Subtasks

Task #16032: Review 15954-boot-test-clusterResolvedTom Clegg


Related issues

Related to Arvados Epics - Story #15941: arvados-bootIn Progress01/15/202009/30/2020

Associated revisions

Revision 18fecbe7
Added by Tom Clegg 7 months ago

Merge branch '15954-boot-test-cluster'

refs #15954

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Tom Clegg 9 months ago

#2 Updated by Tom Clegg 9 months ago

  • Description updated (diff)
  • Subject changed from [boot] Bring up dev cluster using provided config file and source tree to [boot] Bring up test cluster using provided config file and source tree

#3 Updated by Tom Clegg 9 months ago

  • Description updated (diff)

#4 Updated by Tom Clegg 9 months ago

  • Description updated (diff)

#5 Updated by Tom Clegg 9 months ago

  • Description updated (diff)

#6 Updated by Tom Clegg 9 months ago

  • Description updated (diff)

#7 Updated by Tom Clegg 9 months ago

  • Target version set to 2020-01-29 Sprint
  • Assigned To set to Tom Clegg
  • Status changed from New to In Progress

#8 Updated by Tom Clegg 8 months ago

  • Story points set to 4.0

#9 Updated by Peter Amstutz 8 months ago

  • Target version changed from 2020-01-29 Sprint to 2020-02-12 Sprint

#10 Updated by Tom Clegg 8 months ago

  • Target version changed from 2020-02-12 Sprint to 2020-02-26 Sprint

#11 Updated by Peter Amstutz 7 months ago

  • Target version changed from 2020-02-26 Sprint to 2020-03-11 Sprint

#12 Updated by Tom Clegg 7 months ago

15954-boot-test-cluster @ 8a719dbcdfd5da64172855ace2395ce682941214 -- https://ci.arvados.org/view/Developer/job/developer-run-tests/1756/

(fuse tests fail because #16151)

On a system with all the dependencies needed by run-tests.sh, this brings up a test cluster on port 12345 using the code in CWD:

~/arvados$ go run ./cmd/arvados-server boot -config ./doc/examples/config/zzzzz.yml -type test -own-temporary-database -controller-address :12345 -listen-host 0.0.0.0
  • "https://0.0.0.0:12345/" (controller endpoint) appears on stdout when the cluster is ready to use
  • everything else (i.e., logging) is sent to stderr
  • ^C or SIGTERM shuts down all child processes before exiting

There is a new test suite in source:lib/controller/integration_test.go that boots 3 clusters, with one test that saves a collection on cluster A and retrieves it by PDH from cluster B.

#13 Updated by Tom Clegg 7 months ago

  • Description updated (diff)

#15 Updated by Lucas Di Pentima 7 months ago

Some comments & questions:

  • File lib/boot/cmd.go
    • Line 25: “…should call cancel.” comment, is referring to the super.cancel() or fail() func?
    • Line 81: Shouldn’t be logged using super.logger()?
    • Line 85: I think the else clause could be avoided.
  • File lib/boot/supervisor.go
    • Line 75: Shouldn’t be logged using super.logger()?
    • Lines 99, 109: Can we use a more strict permission scheme for dirs/files creation?
    • Line 493: Why does autofillConfig() need the logger to be passed as an argument if it’s already on the Supervisor struct? It also seems that it’s not being used.
    • Lines 521-525: Can this be replaced with a nextPort() call?
  • File lib/boot/cert.go - Line 49: Can we use a more strict permission scheme?
  • Other Qs (probably out of scope of this particular story):
    • Do you think adding a -only-install-deps flag would be useful to do some cache population?
    • What happens to an owned temporary database after quitting? Can we have a not-so-temporary database too?

#16 Updated by Tom Clegg 7 months ago

  • File lib/boot/cmd.go
    • Line 25: “…should call cancel.” comment, is referring to the super.cancel() or fail() func?

Oops, changed to "fail".

  • Line 81: Shouldn’t be logged using super.logger()?

This goes to stdout so a script can easily find the controller URL when it's ready. (Added a comment.)

  • Line 85: I think the else clause could be avoided.

Yes, fixed to use handle-errors-first style.

  • File lib/boot/supervisor.go
    • Line 75: Shouldn’t be logged using super.logger()?

Yes, fixed.

  • Lines 99, 109: Can we use a more strict permission scheme for dirs/files creation?

Yes, fixed. (Typically umask is 022, and this is all in a temp dir with 0700, but turning off group/other-write seems sensible anyway... and config.yml sure doesn't need to be executable.)

  • Line 493: Why does autofillConfig() need the logger to be passed as an argument if it’s already on the Supervisor struct? It also seems that it’s not being used.

Indeed, removed.

  • Lines 521-525: Can this be replaced with a nextPort() call?

Yes, done.

  • File lib/boot/cert.go - Line 49: Can we use a more strict permission scheme?

Done

  • Other Qs (probably out of scope of this particular story):
    • Do you think adding a -only-install-deps flag would be useful to do some cache population?

Yes, either that or "after starting, shutdown and exit 0" which would give more assurance that setup/deps actually worked.

  • What happens to an owned temporary database after quitting? Can we have a not-so-temporary database too?

Yes, for a more convenient dev/trial experience we could put a persistent data dir in /var/lib/arvados and run a dedicated postgresql server on demand -- but for production I imagine we'll still recommend providing connection info for a regular postgresql installation so we don't need to handle tuning, backups, migrating data after upgrading postgresql, etc.

15954-boot-test-cluster @ a9988d4cde254df59d1790ef1e3768d14e2a812e -- https://ci.arvados.org/view/Developer/job/developer-run-tests/1769/

#17 Updated by Lucas Di Pentima 7 months ago

LGTM, please merge. Thanks!

#18 Updated by Tom Clegg 7 months ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF