Cluster configuration » History » Version 13

Version 12 (Tom Clegg, 07/09/2018 07:13 PM) → Version 13/33 (Lucas Di Pentima, 08/13/2018 02:51 PM)

h1. Cluster configuration

We are (2018) consolidating configuration from per-microservice yaml/json/ini files into a single cluster configuration document that is used by all components.
* Long term: system nodes automatically keep their configs synchronized (using something like consul).
* Short term: sysadmin uses tools like puppet and terraform to ensure /etc/arvados/config.yml is identical on all system nodes.
* Hosts without config files (e.g., hosts outside the cluster) can retrieve the config document from the API server.

h2. Discovery document

Previously, we copied selected config values from the API server config into the API discovery document so clients could see them. When clients can get the configuration document itself, this won't be needed. The discovery document should advertise APIs provided by the server, not cluster configuration.

h2. Secrets

Secrets like BlobSigningKey can be given literally in the config file (convenient for dev/test, consul-template, etc) or indirectly using a secret backend. Anticipated backends:
* <code class="yaml">BlobSigningKey: foobar</code> &rArr; the secret is literally <code>foobar</code>
* <code class="yaml">BlobSigningKey: "vault:foobar"</code> &rArr; the secret can be obtained from vault using the vault key "foobar"
* <code class="yaml">BlobSigningKey: "file:/foobar"</code> &rArr; the secret can be read from the local file @/foobar@
* <code class="yaml">BlobSigningKey: "env:FOOBAR"</code> &rArr; the secret can be read from the environment variable @FOOBAR@



h2. Example config file

(Format not yet frozen!)

<pre><code class="yaml">
Clusters:
xyzzy:
BlobSigningKey: ungu355able
BlobSignatureTTL: 172800
SessionKey: 186005aa54cab1ca95a3738e6e954e0a35a96d3d13a8ea541f4156e8d067b4f3
PostgreSQL:
ConnectionPool: 32 # max concurrent connections per arvados server daemon
Connection:
# All parameters here are passed to the PG client library in a connection string;
# see https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS
Host: localhost
Port: 5432
User: arvados
Password: s3cr3t
DBName: arvados_production
client_encoding: utf8
fallback_application_name: arvados
HTTPRequestTimeout: 5m
Defaults:
CollectionReplication: 2
TrashLifetime: 2w
UserActivation:
ActivateNewUsers: true
AutoAdminUser: root@example.com
UserProfileNotificationAddress: notify@example.com
NewUserNotificationRecipients: {}
NewInactiveUserNotificationRecipients: {}
Limits:
MaxRequestLogParamsSize: 2KB
MaxRequestSize: 128MiB
MaxIndexDatabaseRead: 128MiB
MaxItemsPerResponse: 1000
LoggingLevel:
Default: INFO
apiclient: WARNING
googleapiclient: WARNING
NodeManager:
Dispatcher: slurm
PollTime: 10s
BootFailAfter: 1200s
Arvados:
Token: (redacted)
Timeout: 20s
Cloud:
Provider: AWS
Region: us-east-1
Timeout: 20s
ShutdownWindows: 21, 999999
NodeCreate:
PingHost: xyzzy.arvadosapi.com
ExKeyname: compute
ImageID: ami-0a01b48b88d14541e
SubnetID: subnet-24f5ae62
SecurityGroups: sg-3ec53e2a
NodeList:
InstanceStateName: running
Tags:
arvados-class: dynamic-compute
cluster: xyzzy
AuditLogs:
MaxAge: 2w
DeleteBatchSize: 100000
UnloggedAttributes: {} # example: {"manifest_text": true}
ContainerLogStream:
BatchSize: 4KiB
BatchTime: 1s
ThrottlePeriod: 1m
ThrottleThresholdSize: 64KiB
ThrottleThresholdLines: 1024
TruncateSize: 64MiB
PartialLineThrottlePeriod: 5s
Timers:
TrashSweepInterval: 60s
Scaling:
MaxComputeNodes: 64
EnablePreemptibleInstances: false
DisableAPIMethods: {} # example: {"jobs.create": true}
DockerImageFormats: {"v2": true}
Crunch1:
Enable: true
CrunchJobWrapper: none
CrunchJobUser: crunch
CrunchRefreshTrigger: /tmp/crunch_refresh_trigger
DefaultDockerImage: false
NodeProfiles:
# Key is a profile name; can be specified on service prog command line, defaults to $(hostname)
keep:
# Don’t run other services automatically -- only specified ones
Default: {Disable: true}
Keepstore: {Listen: ":25107"}
apiserver:
Default: {Disable: true}
RailsAPI: {Listen: ":9000", TLS: true}
Controller: {Listen: ":9100"}
Websocket: {Listen: ":9101"}
Health: {Listen: ":9199"}
keep:
Default: {Disable: true}
KeepProxy: {Listen: ":9102"}
KeepWeb: {Listen: ":9103"}
*:
# This section used for a node whose profile name is not listed above
Default: {Disable: false} # (this is the default behavior)
Volumes:
xyzzy-keep-0:
Type: s3
Region: us-east
Bucket: xyzzy-keep-0
# [rest of keepstore volume config goes here]
Providers:
AWS:
# [credentials and stuff go here]
us-east-1:
Compute:
Key: ABCDEF
Secret: abcdef123456abcdef123456abcdef123456
Storage:
Key: 987ABC
Secret: 456deadbeef123456deadbeef123456deadbeef123

WebRoutes:
# “default” means route according to method/host/path (e.g., if host is a login shell, route there)
xyzzy.arvadosapi.com: default
# “collections” means always route to keep-web
collections.xyzzy.arvadosapi.com: collections
# leading * is a wildcard (longest match wins)
"*--collections.xyzzy.arvadosapi.com": collections
cloud.curoverse.com: workbench
workbench.xyzzy.arvadosapi.com: workbench
"*.xyzzy.arvadosapi.com": default
InstanceTypes:
m4.large:
VCPUs: 2
RAM: 8000000000
Scratch: 31000000000
Price: 0.1
m4.large-1t:
# same instance type as m4.large but our scripts attach more scratch
ProviderType: m4.large
VCPUs: 2
RAM: 8000000000
Scratch: 999000000000
Price: 0.12
m4.xlarge:
VCPUs: 4
RAM: 16000000000
Scratch: 78000000000
Price: 0.2
m4.8xlarge:
VCPUs: 40
RAM: 160000000000
Scratch: 156000000000
Price: 2
m4.16xlarge:
VCPUs: 64
RAM: 256000000000
Scratch: 310000000000
Price: 3.2
c4.large:
VCPUs: 2
RAM: 3750000000
Price: 0.1
c4.8xlarge:
VCPUs: 36
RAM: 60000000000
Price: 1.591
RemoteClusters:
xrrrr:
Host: xrrrr.arvadosapi.com
Proxy: true # proxy requests to xrrrr on behalf of our clients
AuthProvider: true # users authenticated by xrrrr can use our cluster
</code></pre>