Project

General

Profile

Migrating from arvados-node-manager to arvados-dispatch-cloud » History » Version 8

Tom Clegg, 02/14/2019 01:24 AM

1 5 Tom Clegg
h1. Migrating from arvados-node-manager to arvados-dispatch-cloud
2 1 Tom Clegg
3
{{toc}}
4
5
h2. Choose a node
6
7
The dispatch service can run on any host that can connect to the Arvados API service, the cloud provider's API, and the SSH service on cloud VMs. In the following example it runs on the same node as the API server and controller.
8
9 4 Tom Clegg
h2. Prepare key pair and worker VM image
10
11
Generate an SSH key pair.
12
13
Save the public key in @/root/.ssh/authorized_keys@ in the worker VM image.
14
15
Save the private key in the cluster configuration file (see @PrivateKey@ in the example below).
16
17 1 Tom Clegg
h2. Update cluster configuration file
18
19
In @/etc/arvados/config.yml@, add configuration items for the dispatch service.
20
21
<pre><code class="yaml">
22
Clusters:
23 8 Tom Clegg
  zzzzz:
24 1 Tom Clegg
    CloudVMs:
25
      BootProbeCommand: "mount | grep /mnt/scratch"
26
      SSHPort: "2222"
27
      SyncInterval: 1m
28
      TimeoutIdle: 2m
29
      TimeoutBooting: 10m
30
      TimeoutProbe: 5m
31
      TimeoutShutdown: 30s
32 8 Tom Clegg
      ImageID: "https://zzzzzzzz.blob.core.windows.net/system/Microsoft.Compute/Images/images/zzzzz-compute-osDisk.55555555-5555-5555-5555-555555555555.vhd"
33 7 Ward Vandewege
      Driver: azure
34 1 Tom Clegg
      DriverParameters:
35 2 Tom Clegg
        SubscriptionID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
36 3 Tom Clegg
        subscription_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX        # not needed after #14745
37 2 Tom Clegg
        ClientID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
38 3 Tom Clegg
        key: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX                    # not needed after #14745 (same value as ClientID)
39 8 Tom Clegg
        ClientSecret: 2WyXt0XFbEtutnf2hp528t6Wk9S5bOHWkRaaWwavKQo=
40
        secret: 2WyXt0XFbEtutnf2hp528t6Wk9S5bOHWkRaaWwavKQo=         # not needed after #14745 (same value as ClientSecret)
41 2 Tom Clegg
        TenantID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
42 3 Tom Clegg
        tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX              # not needed after #14745
43 2 Tom Clegg
        CloudEnv: AzurePublicCloud
44 3 Tom Clegg
        cloud_environment: AzurePublicCloud                          # not needed after #14745
45 2 Tom Clegg
        ResourceGroup: zzzzz
46 6 Ward Vandewege
        resource_group: zzzzz                                        # not needed after #14745
47 2 Tom Clegg
        Location: centralus
48 3 Tom Clegg
        region: centralus                                            # not needed after #14745 (same value as Location)
49 2 Tom Clegg
        Network: zzzzz
50
        Subnet: zzzzz-subnet-private
51 3 Tom Clegg
        StorageAccount: example
52 2 Tom Clegg
        storage_account: example                                     # not needed after #14745
53 1 Tom Clegg
        BlobContainer: vhds
54 3 Tom Clegg
        blob_container: vhds                                         # not needed after #14745
55 2 Tom Clegg
        DeleteDanglingResourcesAfter: 20
56 3 Tom Clegg
        delete_dangling_resources_after: 20                          # not needed after #14745
57 8 Tom Clegg
        AdminUsername: arvados
58 1 Tom Clegg
    Dispatch:
59 4 Tom Clegg
      PrivateKey: |
60
        -----BEGIN RSA PRIVATE KEY-----
61
        MIIEowIBAAKCAQEAqYm4XsQHm8sBSZFwUX5VeW1OkGsfoNzcGPG2nzzYRhNhClYZ
62
        0ABHhUk82HkaC/8l6d/jpYTf42HrK42nNQ0r0Yzs7qw8yZMQioK4Yk+kFyVLF78E
63
        GRG4pGAWXFs6pUchs/lm8fo9zcda4R3XeqgI+NO+nEERXmdRJa1FhI+Za3/S/+CV
64
        mg+6O00wZz2+vKmDPptGN4MCKmQOCKsMJts7wSZGyVcTtdNv7jjfr6yPAIOIL8X7
65
        ...
66
        JIBvlVfcHb1IHMA9YG7ZQjrMRmx2Xj3ce4RVPgUGHh8ra7gvLjd72/Tpf0doNClN
67
        ti/hAoGBAMW5D3LhU05LXWmOqpeT4VDgqk4MrTBcstVe7KdVjwzHrVHCAmI927vI
68
        pjpphWzpC9m3x4OsTNf8m+g6H7f3IiQS0aiFNtduXYlcuT5FHS2fSATTzg5PBon9
69
        1E6BudOve+WyFyBs7hFWAqWFBdWujAl4Qk5Ek09U2ilFEPE7RTgJ
70
        -----END RSA PRIVATE KEY-----
71 1 Tom Clegg
      StaleLockTimeout: 1m
72
      PollInterval: 10s
73
      ProbeInterval: 10s
74
      MaxProbesPerSecond: 10
75
    InstanceTypes:
76
      x1lg:
77
        ProviderType: x1.large
78
        VCPUs: 16
79
        RAM: 128G
80
        Scratch: 128G
81
        Price: 1.23
82
    ManagementToken: "example-secret-management-token"
83
    NodeProfiles:
84 8 Tom Clegg
      dispatcher:                       # references ARVADOS_NODE_PROFILE in environment file (see below).
85 1 Tom Clegg
        arvados-dispatch-cloud:
86
          Listen: ":9005"
87
</code></pre>
88
89
Create the host configuration file @/etc/arvados/environment@.
90
91
<pre>
92 8 Tom Clegg
ARVADOS_NODE_PROFILE=dispatcher
93 1 Tom Clegg
</pre>
94
95
h2. Stop crunch-dispatch-slurm
96
97
Stop and disable the crunch-dispatch-slurm service, and uninstall the package to make sure it doesn't start after the next reboot/upgrade.
98
99
<pre>
100
# systemctl stop crunch-dispatch-slurm
101
# systemctl disable crunch-dispatch-slurm
102
# apt-get remove crunch-dispatch-slurm
103
</pre>
104
105 5 Tom Clegg
Containers that have already been locked and submitted to SLURM will make their way through the SLURM queue, but newly queued containers will be left for arvados-dispatch-cloud to run.
106 1 Tom Clegg
107 5 Tom Clegg
h2. Install arvados-dispatch-cloud
108 1 Tom Clegg
109
<pre>
110 5 Tom Clegg
# apt-get install arvados-dispatch-cloud
111 1 Tom Clegg
</pre>
112
113
h2. Verify the service is running
114
115
<pre>
116
$ token="example-secret-management-token"
117
$ curl -H "Authorization: Bearer $token" http://localhost:9005/metrics
118
</pre>
119
120
h2. Verify the service is functional
121 5 Tom Clegg
122
Watch the dispatcher's logs while you run an Arvados container:
123
124
<pre>
125
# journalctl -ocat -fu arvados-dispatch-cloud
126
</pre>