Migrating from arvados-node-manager to arvados-dispatch-cloud » History » Version 3

Version 2 (Tom Clegg, 02/11/2019 04:05 PM) → Version 3/22 (Tom Clegg, 02/11/2019 07:39 PM)

h1. Migrating from arvados-node-manager to crunch-dispatch-cloud

{{toc}}

h2. Choose a node

The dispatch service can run on any host that can connect to the Arvados API service, the cloud provider's API, and the SSH service on cloud VMs. In the following example it runs on the same node as the API server and controller.

h2. Update cluster configuration file

In @/etc/arvados/config.yml@, add configuration items for the dispatch service.

<pre><code class="yaml">
Clusters:
uuid_prefix:
CloudVMs:
BootProbeCommand: "mount | grep /mnt/scratch"
SSHPort: "2222"
SyncInterval: 1m
TimeoutIdle: 2m
TimeoutBooting: 10m
TimeoutProbe: 5m
TimeoutShutdown: 30s
ImageID: "image-12345678"
Driver: Azure
DriverParameters:
SubscriptionID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX # before #14745:
subscription_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX # not needed after #14745
ClientID: key: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
key: secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
tenant_id:
XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
cloud_environment: AzurePublicCloud
resource_group: zzzzz
region: centralus
network: zzzzz
subnet: zzzzz-subnet-private
storage_account: example
blob_container: vhds
delete_dangling_resources_after: 20
# not needed after #14745 (same value as ClientID) #14745:
SubscriptionID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
ClientID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
ClientSecret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX # not needed after #14745 (same value as ClientSecret)
TenantID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX # not needed after #14745
CloudEnv: AzurePublicCloud
cloud_environment: AzurePublicCloud # not needed after #14745
ResourceGroup: zzzzz
resource_group: zzzzz
Location: centralus
region: centralus # not needed after #14745 (same value as Location)
Network: zzzzz
Subnet: zzzzz-subnet-private
StorageAccount: example
storage_account: example # not needed after #14745
BlobContainer: vhds
blob_container: vhds # not needed after #14745
DeleteDanglingResourcesAfter: 20
delete_dangling_resources_after: 20 # not needed after #14745

Dispatch:
PrivateKey: "..."
StaleLockTimeout: 1m
PollInterval: 10s
ProbeInterval: 10s
MaxProbesPerSecond: 10
InstanceTypes:
x1lg:
ProviderType: x1.large
VCPUs: 16
RAM: 128G
Scratch: 128G
Price: 1.23
ManagementToken: "example-secret-management-token"
NodeProfiles:
apiserver: # references ARVADOS_NODE_PROFILE in environment file (see below).
arvados-dispatch-cloud:
Listen: ":9005"
</code></pre>

Create the host configuration file @/etc/arvados/environment@.

<pre>
ARVADOS_NODE_PROFILE=apiserver
</pre>

h2. Stop crunch-dispatch-slurm

Stop and disable the crunch-dispatch-slurm service, and uninstall the package to make sure it doesn't start after the next reboot/upgrade.

<pre>
# systemctl stop crunch-dispatch-slurm
# systemctl disable crunch-dispatch-slurm
# apt-get remove crunch-dispatch-slurm
</pre>

Containers that have already been locked and submitted to SLURM will make their way through the SLURM queue, but newly queued containers will be left for crunch-dispatch-cloud to run.

h2. Install crunch-dispatch-cloud

<pre>
# apt-get install crunch-dispatch-cloud
</pre>

h2. Verify the service is running

<pre>
$ token="example-secret-management-token"
$ curl -H "Authorization: Bearer $token" http://localhost:9005/metrics
</pre>

h2. Verify the service is functional