Migrating from arvados-node-manager to arvados-dispatch-cloud » History » Revision 9
« Previous |
Revision 9/22
(diff)
| Next »
Tom Clegg, 02/14/2019 01:33 AM
Migrating from arvados-node-manager to arvados-dispatch-cloud¶
- Table of contents
- Migrating from arvados-node-manager to arvados-dispatch-cloud
Choose a node¶
The dispatch service can run on any host that can connect to the Arvados API service, the cloud provider's API, and the SSH service on cloud VMs. In the following example it runs on the same node as the API server and controller.
Prepare key pair and worker VM image¶
Generate an SSH private key with no passphrase. Save it in the cluster configuration file (see PrivateKey
in the example below).
If you are using Azure, the dispatcher will create a login account and install your public key automatically, so you do not need to save the corresponding public key in an authorized_keys file in the VM image (or anywhere else, for that matter).
Prepare a worker VM image. It needs docker, arv-mount (python-arvados-fuse), and crunch-run. The version of crunch-run must be new enough to include 2873d55ea (TODO: when merged/published, give minimum package version instead of commit).
Update cluster configuration file¶
In /etc/arvados/config.yml
, add configuration items for the dispatch service.
Clusters:
zzzzz:
CloudVMs:
BootProbeCommand: "mount | grep /mnt/scratch"
SSHPort: "2222"
SyncInterval: 1m
TimeoutIdle: 2m
TimeoutBooting: 10m
TimeoutProbe: 5m
TimeoutShutdown: 30s
ImageID: "https://zzzzzzzz.blob.core.windows.net/system/Microsoft.Compute/Images/images/zzzzz-compute-osDisk.55555555-5555-5555-5555-555555555555.vhd"
Driver: azure
DriverParameters:
SubscriptionID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
subscription_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX # not needed after #14745
ClientID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
key: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX # not needed after #14745 (same value as ClientID)
ClientSecret: 2WyXt0XFbEtutnf2hp528t6Wk9S5bOHWkRaaWwavKQo=
secret: 2WyXt0XFbEtutnf2hp528t6Wk9S5bOHWkRaaWwavKQo= # not needed after #14745 (same value as ClientSecret)
TenantID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX # not needed after #14745
CloudEnv: AzurePublicCloud
cloud_environment: AzurePublicCloud # not needed after #14745
ResourceGroup: zzzzz
resource_group: zzzzz # not needed after #14745
Location: centralus
region: centralus # not needed after #14745 (same value as Location)
Network: zzzzz
Subnet: zzzzz-subnet-private
StorageAccount: example
storage_account: example # not needed after #14745
BlobContainer: vhds
blob_container: vhds # not needed after #14745
DeleteDanglingResourcesAfter: 20
delete_dangling_resources_after: 20 # not needed after #14745
AdminUsername: arvados
Dispatch:
PrivateKey: |
-----BEGIN RSA PRIVATE KEY-----
MIIEowIBAAKCAQEAqYm4XsQHm8sBSZFwUX5VeW1OkGsfoNzcGPG2nzzYRhNhClYZ
0ABHhUk82HkaC/8l6d/jpYTf42HrK42nNQ0r0Yzs7qw8yZMQioK4Yk+kFyVLF78E
GRG4pGAWXFs6pUchs/lm8fo9zcda4R3XeqgI+NO+nEERXmdRJa1FhI+Za3/S/+CV
mg+6O00wZz2+vKmDPptGN4MCKmQOCKsMJts7wSZGyVcTtdNv7jjfr6yPAIOIL8X7
...
JIBvlVfcHb1IHMA9YG7ZQjrMRmx2Xj3ce4RVPgUGHh8ra7gvLjd72/Tpf0doNClN
ti/hAoGBAMW5D3LhU05LXWmOqpeT4VDgqk4MrTBcstVe7KdVjwzHrVHCAmI927vI
pjpphWzpC9m3x4OsTNf8m+g6H7f3IiQS0aiFNtduXYlcuT5FHS2fSATTzg5PBon9
1E6BudOve+WyFyBs7hFWAqWFBdWujAl4Qk5Ek09U2ilFEPE7RTgJ
-----END RSA PRIVATE KEY-----
StaleLockTimeout: 1m
PollInterval: 10s
ProbeInterval: 10s
MaxProbesPerSecond: 10
InstanceTypes:
x1lg:
ProviderType: x1.large
VCPUs: 16
RAM: 128G
Scratch: 128G
Price: 1.23
ManagementToken: "example-secret-management-token"
NodeProfiles:
dispatcher: # references ARVADOS_NODE_PROFILE in environment file (see below).
arvados-dispatch-cloud:
Listen: ":9005"
Create the host configuration file /etc/arvados/environment
.
ARVADOS_NODE_PROFILE=dispatcher
Stop crunch-dispatch-slurm¶
Stop and disable the crunch-dispatch-slurm service, and uninstall the package to make sure it doesn't start after the next reboot/upgrade.
# systemctl stop crunch-dispatch-slurm # systemctl disable crunch-dispatch-slurm # apt-get remove crunch-dispatch-slurm
Containers that have already been locked and submitted to SLURM will make their way through the SLURM queue, but newly queued containers will be left for arvados-dispatch-cloud to run.
Install arvados-dispatch-cloud¶
# apt-get install arvados-dispatch-cloud
Verify the service is running¶
$ token="example-secret-management-token" $ curl -H "Authorization: Bearer $token" http://localhost:9005/metrics
Verify the service is functional¶
Watch the dispatcher's logs while you run an Arvados container:
# journalctl -ocat -fu arvados-dispatch-cloud
Updated by Tom Clegg almost 6 years ago · 22 revisions