Cluster configuration » History » Revision 31
« Previous |
Revision 31/33
(diff)
| Next »
Peter Amstutz, 07/22/2019 07:06 PM
Cluster configuration¶
We are (2019) consolidating configuration from per-microservice yaml/json/ini files into a single cluster configuration document that is used by all components.- Long term: system nodes automatically keep their configs synchronized (using something like consul).
- Short term: sysadmin uses tools like puppet and terraform to ensure /etc/arvados/config.yml is identical on all system nodes.
- Hosts without config files (e.g., hosts outside the cluster) can retrieve the config document from the API server.
Discovery document¶
Previously, we copied selected config values from the API server config into the API discovery document so clients could see them. When clients can get the configuration document itself, this won't be needed. The discovery document should advertise APIs provided by the server, not cluster configuration.
Secrets¶
Secrets like BlobSigningKey can be given literally in the config file (convenient for dev/test, consul-template, etc) or indirectly using a secret backend. Anticipated backends:BlobSigningKey: foobar
⇒ the secret is literallyfoobar
BlobSigningKey: "vault:foobar"
⇒ the secret can be obtained from vault using the vault key "foobar"BlobSigningKey: "file:/foobar"
⇒ the secret can be read from the local file/foobar
BlobSigningKey: "env:FOOBAR"
⇒ the secret can be read from the environment variableFOOBAR
Instructions for ops¶
Tentative instructions for switching config file format/location:- Upgrade Arvados to a version that supports loading all configs from the new cluster-wide config file (maybe 1.4). When services come back up, they will still use your old configuration files, but they will log some deprecation warnings.
- Migrate your configuration to the new config file, one component at a time. For each component:
- Restart the component.
- Inspect the deprecation warning that is logged at startup. It will tell you either "old config file is superfluous" or "new config file is incomplete".
- If your old config file is superfluous, delete it. You're done.
- Run "arvados-server config-diff". This suggests changes to your new config file which will make your old config file obsolete. (Alternatively, run "arvados-server config-dump". This outputs a new config file that would make your old config file obsolete. Saving this might be easier than applying a diff, but it will reorder keys and lose comments.)
- Make the suggested changes.
- Repeat until finished.
- Upgrade to a version that doesn't support old config files at all (maybe 1.5).
Implementation¶
Development strategy for facilitating the above ops instructions:- Read the new config file into an internal struct, if the new config file exists.
- Copy old config file values into the new config struct.
- Use the new config struct internally (the old config is no longer referenced except in the load-and-copy-to-new-struct step).
- Add a mechanism for showing the effect of the old config file on the resulting config struct (see "--config-diff" above).
- At startup, if the old config has any effect (i.e., some parts haven't been migrated to the new config file by the operator), log a deprecation warning recommending "--config-diff" and RTFM.
- Wait one minor version release cycle.
- Error out if the new config file does not exist.
- Error out if the old config file exists (...and some parts of the old config are not redundant [optional?]).
Example/template config file¶
See also Config migration key mapping
(Format not yet frozen!)
Notes:- Keys are CamelCase — except in special cases like PostgreSQL connection settings, which are passed through to another system without being interpreted by Arvados.
- Arrays and lists are not to be used unless order is truly significant. These cannot be expressed natively in consul, and tend to be troublesome anyway: "what changed?" is harder to answer usefully, significance of duplicate elements is unclear, etc. If a list is used, its key must end with the chars "List". This way the value can be stored as a list (in a JSON/YAML file on disk) or a JSON-encoded string (in a system like Consul), and generically encoded/decoded between the two.
Clusters:
xyzzy: # api-server/uuid_prefix, sso/uuid_prefix
SystemRootToken: # arvados-git-sync.rb/arvados_api_token, keepstore/SystemAuthTokenFile, c-d-s/AuthToken
ManagementToken: # {arvados-ws,keepstore,keepproxy,keep-balance}/ManagementToken (& others)
Services:
RailsAPI:
InternalURLs:
"http://zzzzz:8000/": {} # api-server/(protocol,host,port)
ExternalURL: “https://zzzzz.arvadosapi.com/"
Insecure: false
GitHTTP:
InternalURLs:
"http://git:9001/": {}
ExternalURL: "https://git.zzzzz.arvadosapi.com/" # api-server/git_repo_https_base
Keepstore:
InternalURLs:
"http://keep0:25107/": {Unlisted: true}
"http://keep1:25107/": {Debug: true}
Controller:
InternalURLs:
"http://zzzzz:9004/": {} # controller/NodeProfiles.$cluster.Controller.Listen
ExternalURL: "https://zzzzz.arvadosapi.com/" # composer/apiEndPoint, workbench2/API_HOST, workbench/arvados_{login,v1}_base, arvados-ws/Client, keepproxy/Client
Websocket:
InternalURLs:
"http://ws:9003/": {} # arvados-ws/Listen
ExternalURL: "https://ws.zzzzz.arvadosapi.com/" # api-server/websocket_address
Keepbalance:
InternalURLs:
"http://zzzzz:9005": {} # keepbalance/Listen
GitHTTP:
InternalURLs:
"http://zzzzz:9001": {} # arvados-git-httpd/Listen
ExternalURL: "https://git.zzzzz.arvadosapi.com/" # api-server/git_repo_https_base
GitSSH:
ExternalURL: "git@git.zzzzz.arvadosapi.com" # api-server/git_repo_ssh_base
DispatchCloud:
InternalURLs:
"http://zzzzz:9006": {} # a-d-c/NodeProfiles
SSO:
ExternalURL: "https://auth.zzzzz.arvadosapi.com/" # api-server/sso_provider_url
Keepproxy:
InternalURLs:
"http://keep:25107/": {} # keepproxy/Listen
ExternalURL: "https://keep.zzzzz.arvadosapi.com/"
WebDAV:
InternalURLs:
"http://keep:9002/": {} # keep-web/Listen
ExternalURL: "https://*.collections.zzzzz.arvadosapi.com/" # api-server/keep_web_service_url, workbench/keep_web_url
WebDAVDownload:
InternalURLs:
"http://keep:9002/": {} # keep-web/Listen
ExternalURL: "https://download.zzzzz.arvadosapi.com/" # keep-web/AttachmentOnlyHost, workbench/keep_web_download_url
Keepstore:
InternalURLs:
"https://keep0:25107/": {} # keepstore/Listen
"https://keep1:25107/": {} # keepstore/Listen
Composer:
ExternalURL: "http://composer.zzzzz.arvadosapi.com/" # workbench/composer_url
WebShell:
ExternalURL: "http://webshell.zzzzz.arvadosapi.com/" # workbench/shell_in_a_box_url
Workbench1:
InternalURLs:
"http://workbench:9000": {} # workbench/Nginx.server.listen
ExternalURL: "http://workbench.zzzzz.arvadosapi.com/" # workbench/Nginx.server.listen, api-server/workbench_address
Workbench2:
ExternalURL: "http://workbench2.zzzzz.arvadosapi.com/" # workbench/workbench2_url
PostgreSQL:
Connection: # arvados-ws/Postgres, controller/PostgreSQL.Connection
# All parameters here are passed to the PG client library in a connection string;
# see https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS
Host: localhost
Port: 5432
User: arvados
Password: s3cr3t
DBName: arvados_production
client_encoding: utf8
fallback_application_name: arvados
ConnectionPool: # arvados-ws/PostgresPool
TLS:
Certificate: # (literal, file, or acme dir) keepstore/TLSCertificateFile
Key: # (literal, file, or acme dir) keepstore/TLSKeyFile
Insecure: true # workbench/arvados_insecure_https, api-server/sso_insecure
Git:
GitoliteAdminRepo: # arvados-git-sync.rb/gitolite_url
GitoliteAdminPublicKey: # arvados-git-sync.rb/gitolite_arvados_git_user_key
GitoliteSyncWorkDir: # arvados-git-sync.rb/gitolite_tmp
GitCommand: # arv-git-httpd/GitCommand
GitoliteHome: # arv-git-httpd/GitoliteHome
Repositories: # api-server/git_repositories_dir (crunch1 only; just assume {GitoliteHome}/repositories?)
API:
DisabledAPIs: # api-server/disable_api_methods
SendTimeout: # arvados-ws/PingTimeout
WebsocketClientEventQueue: # arvados-ws/ClientEventQueue
WebsocketServerEventQueue: # arvados-ws/ServerEventQueue
KeepServiceRequestTimeout: # keepproxy/Timeout
MaxMemoryBuffers: # keepstore/MaxBuffers
MaxConcurrentRequests: # keepstore/MaxRequests
MaxRequestSize: # api-server/max_request_size
MaxIndexDatabaseRead: # api-server/max_index_database_read
MaxItemsPerResponse: # api-server/max_items_per_response, keep-balance/CollectionBatchSize, keep-balance/CollectionBuffers
MaxRequestAmplification: # controller/RequestLimits.MultiClusterRequestConcurrency
AsyncPermissionsUpdateInterval: # api-server/async_permissions_update_interval
Users:
AutoSetupNewUsers: # api-server/auto_setup_new_users
AutoSetupNewUsersWithVmUUID: # api-server/auto_setup_new_users_with_vm_uuid
AutoSetupNewUsersWithRepository: # api-server/auto_setup_new_users_with_repository
AutoSetupUsernameBlacklist: # api-server/auto_setup_name_blacklist
NewUsersAreActive: # api-server/new_users_are_active
AutoAdminUserWithEmail: # api-server/auto_admin_user
AutoAdminFirstUser: # api-server/auto_admin_first_user
UserProfileNotificationAddress: # api-server/user_profile_notification_address
AdminNotifierEmailFrom: # api-server/admin_notifier_email_from
EmailSubjectPrefix: # api-server/email_subject_prefix
UserNotifierEmailFrom: # api-server/user_notifier_email_from
NewUserNotificationRecipients: # api-server/new_user_notification_recipients
NewInactiveUserNotificationRecipients: # api-server/new_inactive_user_notification_recipients
AnonymousUserToken: # workbench/anonymous_user_token, keep-web/AnonymousTokens
Login:
SiteTitle: # sso/site_title
DefaultLinkTitle: # sso/default_link_title
DefaultLinkURL: # sso/default_link_url
AllowAccountRegistration: # sso/allow_account_registration
RequireEmailConfirmation: # sso/require_email_confirmation
Google:
ClientID: # sso/google_oauth2_client_id
ClientSecret: # sso/google_oauth2_client_secret
LDAP: # sso/use_ldap
Title: # sso/use_ldap.title
Host: # sso/use_ldap.host
Port: # sso/use_ldap.port
Method: # sso/use_ldap.method
Base: # sso/use_ldap.base
Uid: # sso/use_ldap.uid
EmailDomain: # sso/use_ldap.email_domain
BindDN: # sso/use_ldap.BindDN
Password: # sso/user_ldap.password
SecretToken: # sso/secret_token
ProviderAppSecret: # api-server/sso_app_secret
ProviderAppID: # api-server/sso_app_id
AuditLogs:
Enable:
MaxAge: # api-server/max_audit_log_age
MaxDeleteBatch: # api-server/max_audit_log_delete_batch
UnloggedAttributes: # api-server/unlogged_attributes (applies to logs table)
SystemLogs:
LogLevel: # keepstore/Debug, keepproxy/Debug, arvados-ws/LogLevel
Format: # keepstore/LogFormat, arvados-ws/LogFormat
MaxRequestLogParamsSize: # api-server/max_request_log_params_size
Collections:
DefaultReplication: # api-server/default_collection_replication, keepproxy/DefaultReplicas
DefaultTrashLifetime: # api-server/default_trash_lifetime
CollectionVersioning: # api-server/collection_versioning
PreserveVersionIfIdle: # api-server/preserve_version_if_idle
TrustAllContent: # keep-web/TrustAllContent, workbench/trust_all_content
TrashSweepInterval: # api-server/trash_sweep_interval
BlobSigningKey: # api-server/blob_signing_key, keepstore/BlobSigningKeyFile
BlobSigningTTL: # api-server/blob_signature_ttl, keepstore/BlobSignatureTTL
BlobSigning: # keepstore/RequireSignatures, api-server/permit_create_collection_with_unsigned_manifest
BlobTrash: # keepstore/EnableDelete
BlobTrashLifetime: # keepstore/TrashLifetime
BlobTrashCheckInterval: # keepstore/TrashCheckInterval
BlobTrashConcurrency: # keepstore/TrashWorkers, keep-balance/-commit-trash
BlobDeleteConcurrency: # keepstore/EmptyTrashWorkers
BlobReplicateConcurrency: # keepstore/PullWorkers, keep-balance/-commit-pulls
KeepBalanceRunPeriod: 10m # keepbalance/RunPeriod
WebDAVCache:
TTL: # keep-web/Cache.TTL
UUIDTTL: # keep-web/Cache.UUIDTTL
MaxCollectionEntries: # keep-web/Cache.MaxCollectionEntries
MaxCollectionBytes: # keep-web/Cache.MaxCollectionBytes
MaxPermissionEntries: # keep-web/Cache.MaxPermissionEntries
MaxUUIDEntries: # keep-web/Cache.MaxUUIDEntries
Containers: # control how Arvados runs user containers
SupportedDockerImageFormats: # api-server/docker_image_formats
LogReuseDecisions: # api-server/log_reuse_decisions
DefaultKeepCacheRAM: # api-server/container_default_keep_cache_ram
MaxDispatchAttempts: # api-server/max_container_dispatch_attempts
MaxRetryAttempts: # api-server/container_count_max
PollInterval: 10s # c-d-s/PollPeriod, a-d-c/Dispatch/PollInterval
MinRetryPeriod: 30s # c-d-s/MinRetryPeriod (optional? in case ContainerDispatchPollInterval is too short)
CrunchRunCommand: "crunch-run" # c-d-s/CrunchRunCommand
CrunchRunArgumentsList: ["-cgroup-parent-subsystem=memory", "-foo=bar"] # c-d-s/CrunchRunCommand
ReserveExtraRAM: 256MiB # c-d-s/ReserveExtraRAM
UsePreemptibleInstances: # api-server/preemptible_instances
MaxComputeVMs: # api-server/max_compute_nodes
DispatchPrivateKey: # a-d-c/Dispatch/PrivateKey
StaleLockTimeout: # a-d-c/Dispatch/StaleLockTimeout
Logging:
LogBytesPerEvent: # api-server/crunch_log_bytes_per_event
LogSecondsBetweenEvents: # api-server/crunch_log_seconds_between_events
LogThrottlePeriod: # api-server/crunch_log_throttle_period
LogThrottleBytes: # api-server/crunch_log_throttle_bytes
LogThrottleLines: # api-server/crunch_log_throttle_lines
LimitLogBytesPerJob: # api-server/crunch_limit_log_bytes_per_job
LogPartialLineThrottlePeriod: # api-server/crunch_log_partial_line_throttle_period
LogUpdatePeriod: # api-server/crunch_log_update_period
LogUpdateSize: # api-server/crunch_log_update_size
MaxAge: # api-server/clean_container_log_rows_after, api-server/clean_job_log_rows_after
CloudVMs:
Enable: # arvados-dispatch-cloud is in use
BootProbeCommand: # a-d-c/CloudVMs/BootProbeCommand
ProbeInterval: # a-d-c/Dispatch/ProbeInterval
MaxProbesPerSecond: # a-d-c/Dispatch/MaxProbesPerSecond
TimeoutSignal: # a-d-c/Dispatch/TimeoutSignal
TimeoutTERM: # a-d-c/Dispatch/TimeoutTERM
MaxCloudOpsPerSecond: # a-d-c/CloudVMs/MaxCloudOpsPerSecond
SSHPort: # a-d-c/CloudVMs/SSHPort
SyncInterval: # a-d-c/CloudVMs/SyncInterval
TimeoutIdle: # a-d-c/CloudVMs/TimeoutIdle
TimeoutBooting: # a-d-c/CloudVMs/TimeoutBooting
TimeoutProbe: # a-d-c/CloudVMs/TimeoutProbe
TimeoutShutdown: # a-d-c/CloudVMs/TimeoutShutdown
ImageID: # a-d-c/CloudVMs/ImageID
Driver: Amazon # a-d-c/CloudVMs/Driver
DriverParameters: # a-d-c/CloudVMs/DriverParameters
Region: us-east-1
APITimeout: 20s
AWSAccessKeyID: abcdef
AWSSecretAccessKey: abcdefghijklmnopqrstuvwxyz
ImageID: ami-0a01b48b88d14541e
SubnetID: subnet-24f5ae62
SecurityGroups: sg-3ec53e2a
SLURM:
Enable: # crunch-dispatch-slurm is in use
PrioritySpread: 1000 # c-d-s/PrioritySpread
SbatchArguments: ["-partition=PartitionName"] # c-d-s/SbatchArguments
KeepServices:
00000-bi6l4-000000000000000:
InternalURLs:
"http://127.0.0.1:25107": {} # c-d-s/KeepServiceURIs
Managed:
Enable: # arvados-node-manager is in use
DNSServerConfDir: # api-server/dns_server_conf_dir
DNSServerConfTemplate: # api-server/dns_server_conf_template
DNSServerReloadCommand: # api-server/dns_server_reload_command
DNSServerUpdateCommand: # api-server/dns_server_update_command
ComputeNodeDomain: # api-server/compute_node_domain
ComputeNodeNameservers: # api-server/compute_node_nameservers
AssignNodeHostname: # api-server/assign_node_hostname
JobsAPI:
Enable: # api-server/enable_legacy_jobs_api (crunch1)
CrunchJobWrapper: # api-server/crunch_job_wrapper (crunch1)
CrunchJobUser: # api-server/crunch_job_user (crunch1)
CrunchRefreshTrigger: # api-server/crunch_refresh_trigger (crunch1)
GitInternalDir: # api-server/git_internal_dir (crunch1)
ReuseJobIfOutputsDiffer: # api-server/reuse_job_if_outputs_differ
DefaultDockerImage: # api-server/default_docker_image_for_jobs
Volumes: # keepstore/Volumes, keep-balance/KeepServiceTypes
# TODO: some keepstores are closer to specific volumes
zzzzz-ivpuk-voihjznerfweefq:
AccessViaHosts: # replaces differing configs on keepstore hosts
"http://keep0:25107": {ReadOnly: true}
"http://keep1:25107": {}
"http://keep2:25107": {ReadOnly: true}
"http://keep3:25107": {ReadOnly: true}
StorageClasses: # keepstore/S3Volume.StorageClasses, keepstore/AzureBlobVolume.StorageClasses, keepstore/UnixVolume.StorageClasses
default: true
cold: true
Replication: 2 # keepstore/S3Volume.S3Replication, keepstore/AzureBlobVolume.AzureReplication, keepstore/UnixVolume.DirectoryReplication
ReadOnly: false # keepstore/S3Volume.ReadOnly, keepstore/AzureBlobVolume.ReadOnly, keepstore/UnixVolume.ReadOnly
Driver: S3 # keepstore/Volumes[].Type
DriverParameters:
AccessKey: # keepstore/S3Volume.AccessKey
SecretKey: # keepstore/S3Volume.SecretKey
Endpoint: # keepstore/S3Volume.Endpoint
Region: # keepstore/S3Volume.Region
Bucket: # keepstore/S3Volume.Bucket
LocationConstraint: # keepstore/S3Volume.LocationConstraint
IndexPageSize: # keepstore/S3Volume.IndexPageSize
S3Replication:
ConnectTimeout: # keepstore/S3Volume.ConnectTimeout
ReadTimeout: # keepstore/S3Volume.ReadTimeout
RaceWindow: # keepstore/S3Volume.RaceWindow
ReadOnly: #
UnsafeDelete: # keepstore/S3Volume.UnsafeDelete
zzzzz-ivpuk-adbtuyuiivjhbnmb:
AccessViaHosts: # replaces differing configs on keepstore hosts (TBD: do we need “readonly from these hosts”?)
"http://keep1:25107": {ReadOnly: false}
StorageClasses: # keepstore/S3Volume.StorageClasses, keepstore/AzureBlobVolume.StorageClasses, keepstore/UnixVolume.StorageClasses
default: true
cold: false
Replication: 2 # keepstore/S3Volume.S3Replication, keepstore/AzureBlobVolume.AzureReplication, keepstore/UnixVolume.DirectoryReplication
ReadOnly: false # keepstore/S3Volume.ReadOnly, keepstore/AzureBlobVolume.ReadOnly, keepstore/UnixVolume.ReadOnly
Driver: Azure # keepstore/Volumes[].Type
DriverParameters:
StorageAccountName: # keepstore/AzureBlobVolume.StorageAccountName
StorageAccountKey: # keepstore/AzureBlobVolume.StorageAccountKeyFile
StorageBaseURL: # keepstore/AzureBlobVolume.StorageBaseURL
ContainerName: # keepstore/AzureBlobVolume.ContainerName
RequestTimeout: # keepstore/AzureBlobVolume.RequestTimeout
zzzzz-ivpuk-2344guvaiubbae4wa:
Driver: Filesystem # keepstore/Volumes[].Type
DriverParameters:
Root: # keepstore/UnixVolume.Root
Serialize: # keepstore/UnixVolume.Serialize
BlockDeviceUUID: # (disable if this is non-empty and does not match the local filesystem device)
Mail:
MailchimpAPIKey: # api-server/mailchimp_api_key
MailchimpListID: # api-server/mailchimp_list_id
SendUserSetupNotificationEmail: # workbench/send_user_setup_notification_email
IssueReporterEmailFrom: # workbench/issue_reporter_email_from
IssueReporterEmailTo: # workbench/issue_reporter_email_to
SupportEmailAddress: # workbench/support_email_address
EmailFrom: # workbench/email_from
RemoteClusters: # api-server/remote_hosts
xyzzx:
Host:
Proxy: false
Scheme: https
Insecure: false
ActivateUsers: false
"*": # api-server/remote_hosts_via_dns
ActivateUsers: false
Workbench:
Theme: default # workbench/arvados_theme
ActivationContactLink: # workbench/activation_contact_link
ArvadosDocsite: # workbench/arvados_docsite
ArvadosPublicDataDocURL: # workbench/arvados_public_data_doc_url
ShowUserAgreementInline: # workbench/show_user_agreement_inline
SecretToken: # workbench/secret_token
SecretKeyBase: # workbench/secret_key_base
RepositoryCache: # workbench/repository_cache
UserProfileFormFields: # workbench/user_profile_form_fields
UserProfileFormMessage: # workbench/user_profile_form_message
ApplicationMimetypesWithViewIcon: # workbench/application_mimetypes_with_view_icon
LogViewerMaxBytes: # workbench/log_viewer_max_bytes
EnablePublicProjectsPage: # workbench/enable_public_projects_page
EnableGettingStartedPopup: # workbench/enable_getting_started_popup
ApiResponseCompression: # workbench/api_response_compression
APIClientConnectTimeout: # workbench/api_client_connect_timeout
APIClientReceiveTimeout: # workbench/api_client_receive_timeout
RunningJobLogRecordsToFetch: # workbench/running_job_log_records_to_fetch
ShowRecentCollectionsOnDashboard: # workbench/show_recent_collections_on_dashboard
ShowUserNotifications: # workbench/show_user_notifications
MultiSiteSearch: # workbench/multi_site_search
Repositories: # workbench/repositories
SiteName: # workbench/site_name
VocabularyURL: # workbench2/VOCABULARY_URL
FileViewersConfigURL: # workbench2/FILE_VIEWERS_CONFIG_URL
InstanceTypes:
x1l:
ProviderType: x1.large
VCPUs: 16
RAM: 128GiB
Scratch: 128GB
IncludedScratch: 128GB
AddedScratch: 0
Price: 1.23
Preemptible: false
TODO:
KeepproxyDisableGet: # keepproxy/DisableGet (retire this feature / use Nginx instead / use a per-token permission instead)
KeepproxyDisablePut: # keepproxy/DisablePut (retire this feature / use Nginx instead / use a per-token permission instead)
RailsSessionSecretToken: # api-server/secret_token (should this be generated at runtime from superusertoken?)
InternalIPNetworks: # Nginx $external_client
Go Configuration Framework Options¶
Viper and go-config seem to be the leading go config framework contenders considering some of our long term goals (config synchronization); but viper seems to be the more widely adopted of the two.
spf13/viper: https://github.com/spf13/viper
micro/go-config https://github.com/micro/go-config - more useful - https://micro.mu/docs/go-config.html
Both solutions are very similar in terms of reported functionality. Both have watch support, and would allow for merging flags, environment variables, remote key stores (Consul), and our master YAML config. Viper also supports encrypted remote key/value access.
Updated by Peter Amstutz over 5 years ago · 33 revisions