Migrating from arvados-node-manager to arvados-dispatch-cloud » History » Version 7

Ward Vandewege, 02/12/2019 04:34 PM

1 5 Tom Clegg
h1. Migrating from arvados-node-manager to arvados-dispatch-cloud
2 1 Tom Clegg
3 1 Tom Clegg
{{toc}}
4 1 Tom Clegg
5 1 Tom Clegg
h2. Choose a node
6 1 Tom Clegg
7 1 Tom Clegg
The dispatch service can run on any host that can connect to the Arvados API service, the cloud provider's API, and the SSH service on cloud VMs. In the following example it runs on the same node as the API server and controller.
8 1 Tom Clegg
9 4 Tom Clegg
h2. Prepare key pair and worker VM image
10 4 Tom Clegg
11 4 Tom Clegg
Generate an SSH key pair.
12 4 Tom Clegg
13 4 Tom Clegg
Save the public key in @/root/.ssh/authorized_keys@ in the worker VM image.
14 4 Tom Clegg
15 4 Tom Clegg
Save the private key in the cluster configuration file (see @PrivateKey@ in the example below).
16 4 Tom Clegg
17 1 Tom Clegg
h2. Update cluster configuration file
18 1 Tom Clegg
19 1 Tom Clegg
In @/etc/arvados/config.yml@, add configuration items for the dispatch service.
20 1 Tom Clegg
21 1 Tom Clegg
<pre><code class="yaml">
22 1 Tom Clegg
Clusters:
23 1 Tom Clegg
  uuid_prefix:
24 1 Tom Clegg
    CloudVMs:
25 1 Tom Clegg
      BootProbeCommand: "mount | grep /mnt/scratch"
26 1 Tom Clegg
      SSHPort: "2222"
27 1 Tom Clegg
      SyncInterval: 1m
28 1 Tom Clegg
      TimeoutIdle: 2m
29 1 Tom Clegg
      TimeoutBooting: 10m
30 1 Tom Clegg
      TimeoutProbe: 5m
31 1 Tom Clegg
      TimeoutShutdown: 30s
32 1 Tom Clegg
      ImageID: "image-12345678"
33 7 Ward Vandewege
      Driver: azure
34 1 Tom Clegg
      DriverParameters:
35 2 Tom Clegg
        SubscriptionID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
36 3 Tom Clegg
        subscription_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX        # not needed after #14745
37 2 Tom Clegg
        ClientID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
38 3 Tom Clegg
        key: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX                    # not needed after #14745 (same value as ClientID)
39 2 Tom Clegg
        ClientSecret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
40 3 Tom Clegg
        secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX         # not needed after #14745 (same value as ClientSecret)
41 2 Tom Clegg
        TenantID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
42 3 Tom Clegg
        tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX              # not needed after #14745
43 2 Tom Clegg
        CloudEnv: AzurePublicCloud
44 3 Tom Clegg
        cloud_environment: AzurePublicCloud                          # not needed after #14745
45 2 Tom Clegg
        ResourceGroup: zzzzz
46 6 Ward Vandewege
        resource_group: zzzzz                                        # not needed after #14745
47 2 Tom Clegg
        Location: centralus
48 3 Tom Clegg
        region: centralus                                            # not needed after #14745 (same value as Location)
49 2 Tom Clegg
        Network: zzzzz
50 2 Tom Clegg
        Subnet: zzzzz-subnet-private
51 3 Tom Clegg
        StorageAccount: example
52 2 Tom Clegg
        storage_account: example                                     # not needed after #14745
53 3 Tom Clegg
        BlobContainer: vhds
54 2 Tom Clegg
        blob_container: vhds                                         # not needed after #14745
55 3 Tom Clegg
        DeleteDanglingResourcesAfter: 20
56 1 Tom Clegg
        delete_dangling_resources_after: 20                          # not needed after #14745
57 1 Tom Clegg
    Dispatch:
58 4 Tom Clegg
      PrivateKey: |
59 4 Tom Clegg
        -----BEGIN RSA PRIVATE KEY-----
60 4 Tom Clegg
        MIIEowIBAAKCAQEAqYm4XsQHm8sBSZFwUX5VeW1OkGsfoNzcGPG2nzzYRhNhClYZ
61 4 Tom Clegg
        0ABHhUk82HkaC/8l6d/jpYTf42HrK42nNQ0r0Yzs7qw8yZMQioK4Yk+kFyVLF78E
62 4 Tom Clegg
        GRG4pGAWXFs6pUchs/lm8fo9zcda4R3XeqgI+NO+nEERXmdRJa1FhI+Za3/S/+CV
63 4 Tom Clegg
        mg+6O00wZz2+vKmDPptGN4MCKmQOCKsMJts7wSZGyVcTtdNv7jjfr6yPAIOIL8X7
64 4 Tom Clegg
        ...
65 4 Tom Clegg
        JIBvlVfcHb1IHMA9YG7ZQjrMRmx2Xj3ce4RVPgUGHh8ra7gvLjd72/Tpf0doNClN
66 4 Tom Clegg
        ti/hAoGBAMW5D3LhU05LXWmOqpeT4VDgqk4MrTBcstVe7KdVjwzHrVHCAmI927vI
67 4 Tom Clegg
        pjpphWzpC9m3x4OsTNf8m+g6H7f3IiQS0aiFNtduXYlcuT5FHS2fSATTzg5PBon9
68 4 Tom Clegg
        1E6BudOve+WyFyBs7hFWAqWFBdWujAl4Qk5Ek09U2ilFEPE7RTgJ
69 4 Tom Clegg
        -----END RSA PRIVATE KEY-----
70 1 Tom Clegg
      StaleLockTimeout: 1m
71 1 Tom Clegg
      PollInterval: 10s
72 1 Tom Clegg
      ProbeInterval: 10s
73 1 Tom Clegg
      MaxProbesPerSecond: 10
74 1 Tom Clegg
    InstanceTypes:
75 1 Tom Clegg
      x1lg:
76 1 Tom Clegg
        ProviderType: x1.large
77 1 Tom Clegg
        VCPUs: 16
78 1 Tom Clegg
        RAM: 128G
79 1 Tom Clegg
        Scratch: 128G
80 1 Tom Clegg
        Price: 1.23
81 1 Tom Clegg
    ManagementToken: "example-secret-management-token"
82 1 Tom Clegg
    NodeProfiles:
83 1 Tom Clegg
      apiserver:                       # references ARVADOS_NODE_PROFILE in environment file (see below).
84 1 Tom Clegg
        arvados-dispatch-cloud:
85 1 Tom Clegg
          Listen: ":9005"
86 1 Tom Clegg
</code></pre>
87 1 Tom Clegg
88 1 Tom Clegg
Create the host configuration file @/etc/arvados/environment@.
89 1 Tom Clegg
90 1 Tom Clegg
<pre>
91 1 Tom Clegg
ARVADOS_NODE_PROFILE=apiserver
92 1 Tom Clegg
</pre>
93 1 Tom Clegg
94 1 Tom Clegg
h2. Stop crunch-dispatch-slurm
95 1 Tom Clegg
96 1 Tom Clegg
Stop and disable the crunch-dispatch-slurm service, and uninstall the package to make sure it doesn't start after the next reboot/upgrade.
97 1 Tom Clegg
98 1 Tom Clegg
<pre>
99 1 Tom Clegg
# systemctl stop crunch-dispatch-slurm
100 1 Tom Clegg
# systemctl disable crunch-dispatch-slurm
101 1 Tom Clegg
# apt-get remove crunch-dispatch-slurm
102 1 Tom Clegg
</pre>
103 1 Tom Clegg
104 5 Tom Clegg
Containers that have already been locked and submitted to SLURM will make their way through the SLURM queue, but newly queued containers will be left for arvados-dispatch-cloud to run.
105 1 Tom Clegg
106 5 Tom Clegg
h2. Install arvados-dispatch-cloud
107 1 Tom Clegg
108 1 Tom Clegg
<pre>
109 5 Tom Clegg
# apt-get install arvados-dispatch-cloud
110 1 Tom Clegg
</pre>
111 1 Tom Clegg
112 1 Tom Clegg
h2. Verify the service is running
113 1 Tom Clegg
114 1 Tom Clegg
<pre>
115 1 Tom Clegg
$ token="example-secret-management-token"
116 1 Tom Clegg
$ curl -H "Authorization: Bearer $token" http://localhost:9005/metrics
117 1 Tom Clegg
</pre>
118 1 Tom Clegg
119 1 Tom Clegg
h2. Verify the service is functional
120 5 Tom Clegg
121 5 Tom Clegg
Watch the dispatcher's logs while you run an Arvados container:
122 5 Tom Clegg
123 5 Tom Clegg
<pre>
124 5 Tom Clegg
# journalctl -ocat -fu arvados-dispatch-cloud
125 5 Tom Clegg
</pre>