Migrating from arvados-node-manager to arvados-dispatch-cloud » History » Version 2

« Previous - Version 2/22 (diff) - Next » - Current version
Tom Clegg, 02/11/2019 04:05 PM


Migrating from arvados-node-manager to crunch-dispatch-cloud

Choose a node

The dispatch service can run on any host that can connect to the Arvados API service, the cloud provider's API, and the SSH service on cloud VMs. In the following example it runs on the same node as the API server and controller.

Update cluster configuration file

In /etc/arvados/config.yml, add configuration items for the dispatch service.

Clusters:
  uuid_prefix:
    CloudVMs:
      BootProbeCommand: "mount | grep /mnt/scratch" 
      SSHPort: "2222" 
      SyncInterval: 1m
      TimeoutIdle: 2m
      TimeoutBooting: 10m
      TimeoutProbe: 5m
      TimeoutShutdown: 30s
      ImageID: "image-12345678" 
      Driver: Azure
      DriverParameters:
        # before #14745:
        subscription_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        key: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
        tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        cloud_environment: AzurePublicCloud
        resource_group: zzzzz
        region: centralus
        network: zzzzz
        subnet: zzzzz-subnet-private
        storage_account: example
        blob_container: vhds
        delete_dangling_resources_after: 20
        # after #14745:
        SubscriptionID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        ClientID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        ClientSecret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
        TenantID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        CloudEnv: AzurePublicCloud
        ResourceGroup: zzzzz
        Location: centralus
        Network: zzzzz
        Subnet: zzzzz-subnet-private
        StorageAccount: example
        BlobContainer: vhds
        DeleteDanglingResourcesAfter: 20
    Dispatch:
      PrivateKey: "..." 
      StaleLockTimeout: 1m
      PollInterval: 10s
      ProbeInterval: 10s
      MaxProbesPerSecond: 10
    InstanceTypes:
      x1lg:
        ProviderType: x1.large
        VCPUs: 16
        RAM: 128G
        Scratch: 128G
        Price: 1.23
    ManagementToken: "example-secret-management-token" 
    NodeProfiles:
      apiserver:                       # references ARVADOS_NODE_PROFILE in environment file (see below).
        arvados-dispatch-cloud:
          Listen: ":9005" 

Create the host configuration file /etc/arvados/environment.

ARVADOS_NODE_PROFILE=apiserver

Stop crunch-dispatch-slurm

Stop and disable the crunch-dispatch-slurm service, and uninstall the package to make sure it doesn't start after the next reboot/upgrade.

# systemctl stop crunch-dispatch-slurm
# systemctl disable crunch-dispatch-slurm
# apt-get remove crunch-dispatch-slurm

Containers that have already been locked and submitted to SLURM will make their way through the SLURM queue, but newly queued containers will be left for crunch-dispatch-cloud to run.

Install crunch-dispatch-cloud

# apt-get install crunch-dispatch-cloud

Verify the service is running

$ token="example-secret-management-token" 
$ curl -H "Authorization: Bearer $token" http://localhost:9005/metrics

Verify the service is functional