Migrating from arvados-node-manager to crunch-dispatch-cloud¶
- Table of contents
- Migrating from arvados-node-manager to crunch-dispatch-cloud
Choose a node¶
The dispatch service can run on any host that can connect to the Arvados API service, the cloud provider's API, and the SSH service on cloud VMs. In the following example it runs on the same node as the API server and controller.
Update cluster configuration file¶
/etc/arvados/config.yml, add configuration items for the dispatch service.
Clusters: uuid_prefix: CloudVMs: BootProbeCommand: "mount | grep /mnt/scratch" SSHPort: "2222" SyncInterval: 1m TimeoutIdle: 2m TimeoutBooting: 10m TimeoutProbe: 5m TimeoutShutdown: 30s ImageID: "image-12345678" Driver: Azure DriverParameters: # before #14745: subscription_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX key: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX cloud_environment: AzurePublicCloud resource_group: zzzzz region: centralus network: zzzzz subnet: zzzzz-subnet-private storage_account: example blob_container: vhds delete_dangling_resources_after: 20 # after #14745: SubscriptionID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX ClientID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX ClientSecret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX TenantID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX CloudEnv: AzurePublicCloud ResourceGroup: zzzzz Location: centralus Network: zzzzz Subnet: zzzzz-subnet-private StorageAccount: example BlobContainer: vhds DeleteDanglingResourcesAfter: 20 Dispatch: PrivateKey: "..." StaleLockTimeout: 1m PollInterval: 10s ProbeInterval: 10s MaxProbesPerSecond: 10 InstanceTypes: x1lg: ProviderType: x1.large VCPUs: 16 RAM: 128G Scratch: 128G Price: 1.23 ManagementToken: "example-secret-management-token" NodeProfiles: apiserver: # references ARVADOS_NODE_PROFILE in environment file (see below). arvados-dispatch-cloud: Listen: ":9005"
Create the host configuration file
Stop and disable the crunch-dispatch-slurm service, and uninstall the package to make sure it doesn't start after the next reboot/upgrade.
# systemctl stop crunch-dispatch-slurm # systemctl disable crunch-dispatch-slurm # apt-get remove crunch-dispatch-slurm
Containers that have already been locked and submitted to SLURM will make their way through the SLURM queue, but newly queued containers will be left for crunch-dispatch-cloud to run.
# apt-get install crunch-dispatch-cloud
Verify the service is running¶
$ token="example-secret-management-token" $ curl -H "Authorization: Bearer $token" http://localhost:9005/metrics