Project

General

Profile

Actions

Idea #15954

closed

[boot] Bring up test cluster using provided config file and source tree

Added by Tom Clegg over 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
02/28/2020
Due date:
Story points:
4.0
Release relationship:
Auto

Description

The following commands should bring up a functioning test cluster:

git clone https://github.com/arvados/arvados
go run arvados/cmd/arvados-server boot -source-tree ./arvados -config ./arvados/doc/examples/config/zzzzz.yml -temp-dir {...} -test-fixtures=true
Assuming the user has taken care of these prerequisites:
  • PostgreSQL, Ruby, Ruby gems/bundle, Python, nginx, etc. are installed
The resulting cluster should:
  • Use a new temporary directory, and delete it when exiting
  • Use any local/uncommitted changes in the ./arvados work tree
  • Use {temp dir}/keep/ as a keep volume backend
  • Use {temp dir}/ for any pid/lock/temp files
  • Have no sso, workbench2, or composer
The "boot" process should:
  • Stay in the foreground
  • Log to stderr (OK for the time being if some logs go to {provided temp dir}/ instead)
  • Exit (and shut down any child processes) when SIGINT or SIGTERM is received or a child service/component fails

Subtasks 1 (0 open1 closed)

Task #16032: Review 15954-boot-test-clusterResolvedTom Clegg02/28/2020Actions

Related issues

Related to Arvados Epics - Idea #15941: arvados-bootNewActions
Actions #1

Updated by Tom Clegg over 4 years ago

Actions #2

Updated by Tom Clegg over 4 years ago

  • Description updated (diff)
  • Subject changed from [boot] Bring up dev cluster using provided config file and source tree to [boot] Bring up test cluster using provided config file and source tree
Actions #3

Updated by Tom Clegg over 4 years ago

  • Description updated (diff)
Actions #4

Updated by Tom Clegg over 4 years ago

  • Description updated (diff)
Actions #5

Updated by Tom Clegg over 4 years ago

  • Description updated (diff)
Actions #6

Updated by Tom Clegg over 4 years ago

  • Description updated (diff)
Actions #7

Updated by Tom Clegg over 4 years ago

  • Target version set to 2020-01-29 Sprint
  • Assigned To set to Tom Clegg
  • Status changed from New to In Progress
Actions #8

Updated by Tom Clegg over 4 years ago

  • Story points set to 4.0
Actions #9

Updated by Peter Amstutz about 4 years ago

  • Target version changed from 2020-01-29 Sprint to 2020-02-12 Sprint
Actions #10

Updated by Tom Clegg about 4 years ago

  • Target version changed from 2020-02-12 Sprint to 2020-02-26 Sprint
Actions #11

Updated by Peter Amstutz about 4 years ago

  • Target version changed from 2020-02-26 Sprint to 2020-03-11 Sprint
Actions #12

Updated by Tom Clegg about 4 years ago

15954-boot-test-cluster @ 8a719dbcdfd5da64172855ace2395ce682941214 -- developer-run-tests: #1756

(fuse tests fail because #16151)

On a system with all the dependencies needed by run-tests.sh, this brings up a test cluster on port 12345 using the code in CWD:

~/arvados$ go run ./cmd/arvados-server boot -config ./doc/examples/config/zzzzz.yml -type test -own-temporary-database -controller-address :12345 -listen-host 0.0.0.0
  • "https://0.0.0.0:12345/" (controller endpoint) appears on stdout when the cluster is ready to use
  • everything else (i.e., logging) is sent to stderr
  • ^C or SIGTERM shuts down all child processes before exiting

There is a new test suite in source:lib/controller/integration_test.go that boots 3 clusters, with one test that saves a collection on cluster A and retrieves it by PDH from cluster B.

Actions #13

Updated by Tom Clegg about 4 years ago

  • Description updated (diff)
Actions #14

Updated by Tom Clegg about 4 years ago

15954-boot-test-cluster @ a15c20803fb7a1e400a028c00d1c2dd924765a3e -- developer-run-tests: #1757

(merged master to get #16151 fix)

Actions #15

Updated by Lucas Di Pentima about 4 years ago

Some comments & questions:

  • File lib/boot/cmd.go
    • Line 25: “…should call cancel.” comment, is referring to the super.cancel() or fail() func?
    • Line 81: Shouldn’t be logged using super.logger()?
    • Line 85: I think the else clause could be avoided.
  • File lib/boot/supervisor.go
    • Line 75: Shouldn’t be logged using super.logger()?
    • Lines 99, 109: Can we use a more strict permission scheme for dirs/files creation?
    • Line 493: Why does autofillConfig() need the logger to be passed as an argument if it’s already on the Supervisor struct? It also seems that it’s not being used.
    • Lines 521-525: Can this be replaced with a nextPort() call?
  • File lib/boot/cert.go - Line 49: Can we use a more strict permission scheme?
  • Other Qs (probably out of scope of this particular story):
    • Do you think adding a -only-install-deps flag would be useful to do some cache population?
    • What happens to an owned temporary database after quitting? Can we have a not-so-temporary database too?
Actions #16

Updated by Tom Clegg about 4 years ago

  • File lib/boot/cmd.go
    • Line 25: “…should call cancel.” comment, is referring to the super.cancel() or fail() func?

Oops, changed to "fail".

  • Line 81: Shouldn’t be logged using super.logger()?

This goes to stdout so a script can easily find the controller URL when it's ready. (Added a comment.)

  • Line 85: I think the else clause could be avoided.

Yes, fixed to use handle-errors-first style.

  • File lib/boot/supervisor.go
    • Line 75: Shouldn’t be logged using super.logger()?

Yes, fixed.

  • Lines 99, 109: Can we use a more strict permission scheme for dirs/files creation?

Yes, fixed. (Typically umask is 022, and this is all in a temp dir with 0700, but turning off group/other-write seems sensible anyway... and config.yml sure doesn't need to be executable.)

  • Line 493: Why does autofillConfig() need the logger to be passed as an argument if it’s already on the Supervisor struct? It also seems that it’s not being used.

Indeed, removed.

  • Lines 521-525: Can this be replaced with a nextPort() call?

Yes, done.

  • File lib/boot/cert.go - Line 49: Can we use a more strict permission scheme?

Done

  • Other Qs (probably out of scope of this particular story):
    • Do you think adding a -only-install-deps flag would be useful to do some cache population?

Yes, either that or "after starting, shutdown and exit 0" which would give more assurance that setup/deps actually worked.

  • What happens to an owned temporary database after quitting? Can we have a not-so-temporary database too?

Yes, for a more convenient dev/trial experience we could put a persistent data dir in /var/lib/arvados and run a dedicated postgresql server on demand -- but for production I imagine we'll still recommend providing connection info for a regular postgresql installation so we don't need to handle tuning, backups, migrating data after upgrading postgresql, etc.

15954-boot-test-cluster @ a9988d4cde254df59d1790ef1e3768d14e2a812e -- developer-run-tests: #1769

Actions #17

Updated by Lucas Di Pentima about 4 years ago

LGTM, please merge. Thanks!

Actions #18

Updated by Tom Clegg about 4 years ago

  • Status changed from In Progress to Resolved
Actions #19

Updated by Peter Amstutz over 3 years ago

  • Release set to 25
Actions

Also available in: Atom PDF