Project

General

Profile

Cluster configuration » History » Version 13

Lucas Di Pentima, 08/13/2018 02:51 PM
Added nodemanager's config items

1 1 Tom Clegg
h1. Cluster configuration
2
3
We are (2018) consolidating configuration from per-microservice yaml/json/ini files into a single cluster configuration document that is used by all components.
4
* Long term: system nodes automatically keep their configs synchronized (using something like consul).
5
* Short term: sysadmin uses tools like puppet and terraform to ensure /etc/arvados/config.yml is identical on all system nodes.
6
* Hosts without config files (e.g., hosts outside the cluster) can retrieve the config document from the API server.
7
8
h2. Discovery document
9
10
Previously, we copied selected config values from the API server config into the API discovery document so clients could see them. When clients can get the configuration document itself, this won't be needed. The discovery document should advertise APIs provided by the server, not cluster configuration.
11
12 7 Tom Clegg
h2. Secrets
13
14
Secrets like BlobSigningKey can be given literally in the config file (convenient for dev/test, consul-template, etc) or indirectly using a secret backend. Anticipated backends:
15
* <code class="yaml">BlobSigningKey: foobar</code> &rArr; the secret is literally <code>foobar</code>
16
* <code class="yaml">BlobSigningKey: "vault:foobar"</code> &rArr; the secret can be obtained from vault using the vault key "foobar"
17
* <code class="yaml">BlobSigningKey: "file:/foobar"</code> &rArr; the secret can be read from the local file @/foobar@
18
* <code class="yaml">BlobSigningKey: "env:FOOBAR"</code> &rArr; the secret can be read from the environment variable @FOOBAR@
19
20 1 Tom Clegg
h2. Example config file
21
22
(Format not yet frozen!)
23
24
<pre><code class="yaml">
25
Clusters:
26
  xyzzy:
27
    BlobSigningKey: ungu355able
28
    BlobSignatureTTL: 172800
29 6 Tom Clegg
    SessionKey: 186005aa54cab1ca95a3738e6e954e0a35a96d3d13a8ea541f4156e8d067b4f3
30 4 Tom Clegg
    PostgreSQL:
31 11 Tom Clegg
      ConnectionPool: 32 # max concurrent connections per arvados server daemon
32 10 Tom Clegg
      Connection:
33
        # All parameters here are passed to the PG client library in a connection string;
34
        # see https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS
35
        Host: localhost
36
        Port: 5432
37
        User: arvados
38
        Password: s3cr3t
39
        DBName: arvados_production
40
        client_encoding: utf8
41
        fallback_application_name: arvados
42 4 Tom Clegg
    HTTPRequestTimeout: 5m
43 6 Tom Clegg
    Defaults:
44
      CollectionReplication: 2
45
      TrashLifetime: 2w
46
    UserActivation:
47
      ActivateNewUsers: true
48
      AutoAdminUser: root@example.com
49
      UserProfileNotificationAddress: notify@example.com
50 8 Tom Clegg
      NewUserNotificationRecipients: {}
51
      NewInactiveUserNotificationRecipients: {}
52 6 Tom Clegg
    Limits:
53
      MaxRequestLogParamsSize: 2KB
54
      MaxRequestSize: 128MiB
55
      MaxIndexDatabaseRead: 128MiB
56
      MaxItemsPerResponse: 1000
57 13 Lucas Di Pentima
    LoggingLevel:
58
      Default: INFO
59
      apiclient: WARNING
60
      googleapiclient: WARNING
61
    NodeManager:
62
      Dispatcher: slurm
63
      PollTime: 10s
64
      BootFailAfter: 1200s
65
      Arvados:
66
        Token: (redacted)
67
        Timeout: 20s
68
      Cloud:
69
        Provider: AWS
70
        Region: us-east-1
71
        Timeout: 20s
72
        ShutdownWindows: 21, 999999
73
        NodeCreate:
74
          PingHost: xyzzy.arvadosapi.com
75
          ExKeyname: compute
76
          ImageID: ami-0a01b48b88d14541e
77
          SubnetID: subnet-24f5ae62
78
          SecurityGroups: sg-3ec53e2a
79
        NodeList:
80
          InstanceStateName: running
81
          Tags:
82
            arvados-class: dynamic-compute
83
            cluster: xyzzy
84 6 Tom Clegg
    AuditLogs:
85
      MaxAge: 2w
86
      DeleteBatchSize: 100000
87 8 Tom Clegg
      UnloggedAttributes: {} # example: {"manifest_text": true}
88 6 Tom Clegg
    ContainerLogStream:
89
      BatchSize: 4KiB
90
      BatchTime: 1s
91
      ThrottlePeriod: 1m
92
      ThrottleThresholdSize: 64KiB
93
      ThrottleThresholdLines: 1024
94
      TruncateSize: 64MiB
95
      PartialLineThrottlePeriod: 5s
96
    Timers:
97
      TrashSweepInterval: 60s
98
    Scaling:
99
      MaxComputeNodes: 64
100
      EnablePreemptibleInstances: false
101 8 Tom Clegg
    DisableAPIMethods: {} # example: {"jobs.create": true}
102
    DockerImageFormats: {"v2": true}
103 6 Tom Clegg
    Crunch1:
104
      Enable: true
105
      CrunchJobWrapper: none
106
      CrunchJobUser: crunch
107 12 Tom Clegg
      CrunchRefreshTrigger: /tmp/crunch_refresh_trigger
108 6 Tom Clegg
      DefaultDockerImage: false
109 4 Tom Clegg
    NodeProfiles:
110
      # Key is a profile name; can be specified on service prog command line, defaults to $(hostname)
111
      keep:
112
        # Don’t run other services automatically -- only specified ones
113
        Default: {Disable: true}
114
        Keepstore: {Listen: ":25107"}
115
      apiserver:
116
        Default: {Disable: true}
117
        RailsAPI: {Listen: ":9000", TLS: true}
118
        Controller: {Listen: ":9100"}
119
        Websocket: {Listen: ":9101"}
120
        Health: {Listen: ":9199"}
121
      keep:
122
        Default: {Disable: true}
123
        KeepProxy: {Listen: ":9102"}
124
        KeepWeb: {Listen: ":9103"}
125
      *:
126
        # This section used for a node whose profile name is not listed above
127
        Default: {Disable: false} # (this is the default behavior)
128 1 Tom Clegg
    Volumes:
129
      xyzzy-keep-0:
130
        Type: s3
131
        Region: us-east
132
        Bucket: xyzzy-keep-0
133
        # [rest of keepstore volume config goes here]
134
    Providers:
135
      AWS:
136
        # [credentials and stuff go here]
137 13 Lucas Di Pentima
        us-east-1:
138
          Compute:
139
            Key: ABCDEF
140
            Secret: abcdef123456abcdef123456abcdef123456
141
          Storage:
142
            Key: 987ABC
143
            Secret: 456deadbeef123456deadbeef123456deadbeef123
144 4 Tom Clegg
    WebRoutes:
145 5 Tom Clegg
      # “default” means route according to method/host/path (e.g., if host is a login shell, route there)
146 4 Tom Clegg
      xyzzy.arvadosapi.com: default
147
      # “collections” means always route to keep-web
148
      collections.xyzzy.arvadosapi.com: collections
149
      # leading * is a wildcard (longest match wins)
150
      "*--collections.xyzzy.arvadosapi.com": collections
151
      cloud.curoverse.com: workbench
152
      workbench.xyzzy.arvadosapi.com: workbench
153
      "*.xyzzy.arvadosapi.com": default
154 3 Tom Clegg
    InstanceTypes:
155 8 Tom Clegg
      m4.large:
156
        VCPUs: 2
157
        RAM: 8000000000
158
        Scratch: 31000000000
159
        Price: 0.1
160
      m4.large-1t:
161
        # same instance type as m4.large but our scripts attach more scratch
162
        ProviderType: m4.large
163
        VCPUs: 2
164
        RAM: 8000000000
165
        Scratch: 999000000000
166
        Price: 0.12
167
      m4.xlarge:
168
        VCPUs: 4
169
        RAM: 16000000000
170
        Scratch: 78000000000
171
        Price: 0.2
172
      m4.8xlarge:
173
        VCPUs: 40
174
        RAM: 160000000000
175
        Scratch: 156000000000
176
        Price: 2
177
      m4.16xlarge:
178
        VCPUs: 64
179
        RAM: 256000000000
180
        Scratch: 310000000000
181
        Price: 3.2
182
      c4.large:
183
        VCPUs: 2
184
        RAM: 3750000000
185
        Price: 0.1
186
      c4.8xlarge:
187
        VCPUs: 36
188
        RAM: 60000000000
189
        Price: 1.591
190 9 Tom Clegg
    RemoteClusters:
191
      xrrrr:
192
        Host: xrrrr.arvadosapi.com
193
        Proxy: true        # proxy requests to xrrrr on behalf of our clients
194
        AuthProvider: true # users authenticated by xrrrr can use our cluster
195 1 Tom Clegg
</code></pre>