Project

General

Profile

Cluster configuration » History » Version 15

Tom Clegg, 10/03/2018 01:23 PM

1 1 Tom Clegg
h1. Cluster configuration
2
3
We are (2018) consolidating configuration from per-microservice yaml/json/ini files into a single cluster configuration document that is used by all components.
4
* Long term: system nodes automatically keep their configs synchronized (using something like consul).
5
* Short term: sysadmin uses tools like puppet and terraform to ensure /etc/arvados/config.yml is identical on all system nodes.
6
* Hosts without config files (e.g., hosts outside the cluster) can retrieve the config document from the API server.
7
8
h2. Discovery document
9
10
Previously, we copied selected config values from the API server config into the API discovery document so clients could see them. When clients can get the configuration document itself, this won't be needed. The discovery document should advertise APIs provided by the server, not cluster configuration.
11
12 7 Tom Clegg
h2. Secrets
13
14
Secrets like BlobSigningKey can be given literally in the config file (convenient for dev/test, consul-template, etc) or indirectly using a secret backend. Anticipated backends:
15
* <code class="yaml">BlobSigningKey: foobar</code> &rArr; the secret is literally <code>foobar</code>
16
* <code class="yaml">BlobSigningKey: "vault:foobar"</code> &rArr; the secret can be obtained from vault using the vault key "foobar"
17
* <code class="yaml">BlobSigningKey: "file:/foobar"</code> &rArr; the secret can be read from the local file @/foobar@
18
* <code class="yaml">BlobSigningKey: "env:FOOBAR"</code> &rArr; the secret can be read from the environment variable @FOOBAR@
19
20 1 Tom Clegg
h2. Example config file
21
22
(Format not yet frozen!)
23
24
<pre><code class="yaml">
25
Clusters:
26
  xyzzy:
27
    BlobSigningKey: ungu355able
28
    BlobSignatureTTL: 172800
29 6 Tom Clegg
    SessionKey: 186005aa54cab1ca95a3738e6e954e0a35a96d3d13a8ea541f4156e8d067b4f3
30 4 Tom Clegg
    PostgreSQL:
31 11 Tom Clegg
      ConnectionPool: 32 # max concurrent connections per arvados server daemon
32 10 Tom Clegg
      Connection:
33
        # All parameters here are passed to the PG client library in a connection string;
34
        # see https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS
35
        Host: localhost
36
        Port: 5432
37
        User: arvados
38
        Password: s3cr3t
39
        DBName: arvados_production
40
        client_encoding: utf8
41
        fallback_application_name: arvados
42 4 Tom Clegg
    HTTPRequestTimeout: 5m
43 6 Tom Clegg
    Defaults:
44
      CollectionReplication: 2
45
      TrashLifetime: 2w
46
    UserActivation:
47
      ActivateNewUsers: true
48
      AutoAdminUser: root@example.com
49
      UserProfileNotificationAddress: notify@example.com
50 8 Tom Clegg
      NewUserNotificationRecipients: {}
51
      NewInactiveUserNotificationRecipients: {}
52 15 Tom Clegg
    RequestLimits:
53 6 Tom Clegg
      MaxRequestLogParamsSize: 2KB
54
      MaxRequestSize: 128MiB
55
      MaxIndexDatabaseRead: 128MiB
56 1 Tom Clegg
      MaxItemsPerResponse: 1000
57 15 Tom Clegg
      MultiClusterRequestConcurrency: 4
58 14 Tom Clegg
    LogLevel: info
59
    CloudVMs:
60
      BootTimeout: 20m
61
      Driver: Amazon
62
      DriverParameters:
63 13 Lucas Di Pentima
        Region: us-east-1
64 14 Tom Clegg
        APITimeout: 20s
65
        EC2Key: abcdef
66
        EC2Secret: abcdefghijklmnopqrstuvwxyz
67
        StorageKey: abcdef
68
        StorageSecret: abcdefghijklmnopqrstuvwxyz
69
        ImageID: ami-0a01b48b88d14541e
70
        SubnetID: subnet-24f5ae62
71
        SecurityGroups: sg-3ec53e2a
72 13 Lucas Di Pentima
    AuditLogs:
73
      MaxAge: 2w
74 6 Tom Clegg
      DeleteBatchSize: 100000
75
      UnloggedAttributes: {} # example: {"manifest_text": true}
76
    ContainerLogStream:
77 8 Tom Clegg
      BatchSize: 4KiB
78 6 Tom Clegg
      BatchTime: 1s
79
      ThrottlePeriod: 1m
80
      ThrottleThresholdSize: 64KiB
81
      ThrottleThresholdLines: 1024
82
      TruncateSize: 64MiB
83
      PartialLineThrottlePeriod: 5s
84
    Timers:
85
      TrashSweepInterval: 60s
86 14 Tom Clegg
      ContainerDispatchPollInterval: 10s
87
      APIRequestTimeout: 20s
88 6 Tom Clegg
    Scaling:
89
      MaxComputeNodes: 64
90
      EnablePreemptibleInstances: false
91 8 Tom Clegg
    DisableAPIMethods: {} # example: {"jobs.create": true}
92
    DockerImageFormats: {"v2": true}
93 6 Tom Clegg
    Crunch1:
94
      Enable: true
95
      CrunchJobWrapper: none
96
      CrunchJobUser: crunch
97 12 Tom Clegg
      CrunchRefreshTrigger: /tmp/crunch_refresh_trigger
98 6 Tom Clegg
      DefaultDockerImage: false
99 4 Tom Clegg
    NodeProfiles:
100
      # Key is a profile name; can be specified on service prog command line, defaults to $(hostname)
101
      keep:
102
        # Don’t run other services automatically -- only specified ones
103
        Default: {Disable: true}
104
        Keepstore: {Listen: ":25107"}
105
      apiserver:
106
        Default: {Disable: true}
107
        RailsAPI: {Listen: ":9000", TLS: true}
108
        Controller: {Listen: ":9100"}
109 1 Tom Clegg
        Websocket: {Listen: ":9101"}
110
        Health: {Listen: ":9199"}
111
      keep:
112
        Default: {Disable: true}
113
        KeepProxy: {Listen: ":9102"}
114
        KeepWeb: {Listen: ":9103"}
115
      *:
116
        # This section used for a node whose profile name is not listed above
117 13 Lucas Di Pentima
        Default: {Disable: false} # (this is the default behavior)
118
    Volumes:
119
      xyzzy-keep-0:
120
        Type: s3
121
        Region: us-east
122
        Bucket: xyzzy-keep-0
123
        # [rest of keepstore volume config goes here]
124 4 Tom Clegg
    WebRoutes:
125 5 Tom Clegg
      # “default” means route according to method/host/path (e.g., if host is a login shell, route there)
126 4 Tom Clegg
      xyzzy.arvadosapi.com: default
127
      # “collections” means always route to keep-web
128
      collections.xyzzy.arvadosapi.com: collections
129
      # leading * is a wildcard (longest match wins)
130
      "*--collections.xyzzy.arvadosapi.com": collections
131
      cloud.curoverse.com: workbench
132
      workbench.xyzzy.arvadosapi.com: workbench
133
      "*.xyzzy.arvadosapi.com": default
134 3 Tom Clegg
    InstanceTypes:
135 8 Tom Clegg
      m4.large:
136
        VCPUs: 2
137
        RAM: 8000000000
138
        Scratch: 31000000000
139
        Price: 0.1
140
      m4.large-1t:
141
        # same instance type as m4.large but our scripts attach more scratch
142
        ProviderType: m4.large
143
        VCPUs: 2
144
        RAM: 8000000000
145
        Scratch: 999000000000
146
        Price: 0.12
147
      m4.xlarge:
148
        VCPUs: 4
149
        RAM: 16000000000
150
        Scratch: 78000000000
151
        Price: 0.2
152
      m4.8xlarge:
153
        VCPUs: 40
154
        RAM: 160000000000
155
        Scratch: 156000000000
156
        Price: 2
157
      m4.16xlarge:
158
        VCPUs: 64
159
        RAM: 256000000000
160
        Scratch: 310000000000
161
        Price: 3.2
162
      c4.large:
163
        VCPUs: 2
164
        RAM: 3750000000
165
        Price: 0.1
166
      c4.8xlarge:
167
        VCPUs: 36
168
        RAM: 60000000000
169
        Price: 1.591
170 9 Tom Clegg
    RemoteClusters:
171
      xrrrr:
172
        Host: xrrrr.arvadosapi.com
173
        Proxy: true        # proxy requests to xrrrr on behalf of our clients
174
        AuthProvider: true # users authenticated by xrrrr can use our cluster
175 1 Tom Clegg
</code></pre>