Migrating from arvados-node-manager to arvados-dispatch-cloud » History » Version 5

« Previous - Version 5/22 (diff) - Next » - Current version
Tom Clegg, 02/12/2019 02:39 PM


Migrating from arvados-node-manager to arvados-dispatch-cloud

Choose a node

The dispatch service can run on any host that can connect to the Arvados API service, the cloud provider's API, and the SSH service on cloud VMs. In the following example it runs on the same node as the API server and controller.

Prepare key pair and worker VM image

Generate an SSH key pair.

Save the public key in /root/.ssh/authorized_keys in the worker VM image.

Save the private key in the cluster configuration file (see PrivateKey in the example below).

Update cluster configuration file

In /etc/arvados/config.yml, add configuration items for the dispatch service.

Clusters:
  uuid_prefix:
    CloudVMs:
      BootProbeCommand: "mount | grep /mnt/scratch" 
      SSHPort: "2222" 
      SyncInterval: 1m
      TimeoutIdle: 2m
      TimeoutBooting: 10m
      TimeoutProbe: 5m
      TimeoutShutdown: 30s
      ImageID: "image-12345678" 
      Driver: Azure
      DriverParameters:
        SubscriptionID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        subscription_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX        # not needed after #14745
        ClientID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        key: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX                    # not needed after #14745 (same value as ClientID)
        ClientSecret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
        secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX         # not needed after #14745 (same value as ClientSecret)
        TenantID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX              # not needed after #14745
        CloudEnv: AzurePublicCloud
        cloud_environment: AzurePublicCloud                          # not needed after #14745
        ResourceGroup: zzzzz
        resource_group: zzzzz
        Location: centralus
        region: centralus                                            # not needed after #14745 (same value as Location)
        Network: zzzzz
        Subnet: zzzzz-subnet-private
        StorageAccount: example
        storage_account: example                                     # not needed after #14745
        BlobContainer: vhds
        blob_container: vhds                                         # not needed after #14745
        DeleteDanglingResourcesAfter: 20
        delete_dangling_resources_after: 20                          # not needed after #14745
    Dispatch:
      PrivateKey: |
        -----BEGIN RSA PRIVATE KEY-----
        MIIEowIBAAKCAQEAqYm4XsQHm8sBSZFwUX5VeW1OkGsfoNzcGPG2nzzYRhNhClYZ
        0ABHhUk82HkaC/8l6d/jpYTf42HrK42nNQ0r0Yzs7qw8yZMQioK4Yk+kFyVLF78E
        GRG4pGAWXFs6pUchs/lm8fo9zcda4R3XeqgI+NO+nEERXmdRJa1FhI+Za3/S/+CV
        mg+6O00wZz2+vKmDPptGN4MCKmQOCKsMJts7wSZGyVcTtdNv7jjfr6yPAIOIL8X7
        ...
        JIBvlVfcHb1IHMA9YG7ZQjrMRmx2Xj3ce4RVPgUGHh8ra7gvLjd72/Tpf0doNClN
        ti/hAoGBAMW5D3LhU05LXWmOqpeT4VDgqk4MrTBcstVe7KdVjwzHrVHCAmI927vI
        pjpphWzpC9m3x4OsTNf8m+g6H7f3IiQS0aiFNtduXYlcuT5FHS2fSATTzg5PBon9
        1E6BudOve+WyFyBs7hFWAqWFBdWujAl4Qk5Ek09U2ilFEPE7RTgJ
        -----END RSA PRIVATE KEY-----
      StaleLockTimeout: 1m
      PollInterval: 10s
      ProbeInterval: 10s
      MaxProbesPerSecond: 10
    InstanceTypes:
      x1lg:
        ProviderType: x1.large
        VCPUs: 16
        RAM: 128G
        Scratch: 128G
        Price: 1.23
    ManagementToken: "example-secret-management-token" 
    NodeProfiles:
      apiserver:                       # references ARVADOS_NODE_PROFILE in environment file (see below).
        arvados-dispatch-cloud:
          Listen: ":9005" 

Create the host configuration file /etc/arvados/environment.

ARVADOS_NODE_PROFILE=apiserver

Stop crunch-dispatch-slurm

Stop and disable the crunch-dispatch-slurm service, and uninstall the package to make sure it doesn't start after the next reboot/upgrade.

# systemctl stop crunch-dispatch-slurm
# systemctl disable crunch-dispatch-slurm
# apt-get remove crunch-dispatch-slurm

Containers that have already been locked and submitted to SLURM will make their way through the SLURM queue, but newly queued containers will be left for arvados-dispatch-cloud to run.

Install arvados-dispatch-cloud

# apt-get install arvados-dispatch-cloud

Verify the service is running

$ token="example-secret-management-token" 
$ curl -H "Authorization: Bearer $token" http://localhost:9005/metrics

Verify the service is functional

Watch the dispatcher's logs while you run an Arvados container:

# journalctl -ocat -fu arvados-dispatch-cloud