Project

General

Profile

Actions

Story #16552

closed

"arvados-server init" can get TLS certificates from Let's Encrypt

Added by Peter Amstutz over 2 years ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Target version:
Start date:
07/14/2022
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Assuming the host meets the eligibility requirements:
  • publicly routable DNS
  • port 80 and 443 are reachable for LE validation

Provide "get TLS certificates from Let's Encrypt" option on arvados-server init command line or wizard.

Generate keys, obtain certificates, and arrange to renew them automatically as needed.


Subtasks 1 (0 open1 closed)

Task #16607: Review 16552-autocertResolvedLucas Di Pentima07/14/2022

Actions

Related issues

Related to Arvados Epics - Story #18337: Easy install via OS packageIn Progress12/01/202203/31/2023

Actions
Related to Arvados Epics - Story #15941: arvados-bootIn Progress07/01/202203/31/2023

Actions
Actions #1

Updated by Peter Amstutz over 2 years ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2020-07-15 to 2020-08-12 Sprint
Actions #3

Updated by Tom Clegg over 2 years ago

  • Assigned To set to Tom Clegg
Actions #4

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2020-08-12 Sprint to 2020-08-26 Sprint
Actions #5

Updated by Tom Clegg over 2 years ago

  • Target version changed from 2020-08-26 Sprint to 2020-09-09 Sprint
Actions #6

Updated by Tom Clegg over 2 years ago

  • Target version changed from 2020-09-09 Sprint to 2020-09-23 Sprint
Actions #7

Updated by Tom Clegg over 2 years ago

  • Target version changed from 2020-09-23 Sprint to 2020-10-07 Sprint
Actions #8

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2020-10-07 Sprint to 2020-10-21 Sprint
Actions #9

Updated by Peter Amstutz over 2 years ago

Actions #10

Updated by Peter Amstutz over 2 years ago

  • Target version deleted (2020-10-21 Sprint)
Actions #12

Updated by Tom Clegg over 1 year ago

  • Subject changed from arvados-server config wizard for single node install to "arvados-server init" can get TLS certificates from Let's Encrypt
Actions #13

Updated by Tom Clegg over 1 year ago

  • Category set to Deployment
  • Description updated (diff)
Actions #14

Updated by Tom Clegg over 1 year ago

Actions #15

Updated by Tom Clegg over 1 year ago

  • Related to Story #18337: Easy install via OS package added
Actions #16

Updated by Tom Clegg 7 months ago

Actions #17

Updated by Tom Clegg 7 months ago

  • Status changed from New to In Progress
Actions #21

Updated by Tom Clegg 7 months ago

16552-autocert @ e12c1fed6336048d6ab854bbfab95eccf7c1b372 -- developer-run-tests: #3230

(wb1 failed) ...with a fix for the race-induced port collision that was causing lib/controller multi-cluster integration tests to be unreliable.

Actions #22

Updated by Tom Clegg 7 months ago

  • Target version set to 2022-07-20
Actions #23

Updated by Lucas Di Pentima 7 months ago

Some comments & questions:

  • I found https://github.com/letsencrypt/pebble that might help us having the LE autocert code tested, wdyt?
  • What's "acme" option for? is it to provide LE certs managed by something else?
  • Do you think it would be a bit more clear to move the config knobs TLS.Automatic & TLS.Staging to a parent key, like TLS.LE.Enable & TLS.LE.Staging?
  • Testing the arvados-easy-package in a DigitalOcean droplet:
    • Postgresql wasn't installed, so it crashed right away
    • After installing postgresql, removing the config file created in the first attempt and re-running "init" I got a postgresql database and an empty nginx service.
    • Checking the syslogs I'm seeing:
      arvados-server[25430]: {"PID":25430,"error":"autocert http-01 challenge handler stopped: listen tcp :80: bind: address already in use","level":"error","msg":"task failed","task":"certificates","time":"2022-07-15T20:17:57.652253479Z"}
    • Stopping nginx & restarting arvados allowed me to get it to work with SSL
    • Keep services seem to not be working because the client tries to connect to 0.0.0.0:9010
Actions #24

Updated by Tom Clegg 7 months ago

Re testing/pebble: It would be good to test the autocert stuff (start pebble server, boot a test cluster, connect to controller, and check that the cert was issued by pebble?) ... but we can't even listen on port 80 with our non-root testing setup. I'm wary that a test case could end up uselessly testing some test-only code paths and autocert itself. Maybe enabling this on a future dev cluster, and using it to get real certs, is the best way to protect it from regressions?

The -tls=acme option is for when you want something like acmetool to be responsible for obtaining certificates and putting them in /var/lib/acme/. The idea is to let you get real certs, inject them with a bind mount, and test the arvados package/boot stuff in a container without worrying about making the container reachable by LE validation, hitting LE rate limits, etc.

I'm thinking "-tls=auto" could be renamed to "-tls=acme", and the old "-tls=acme" could be replaced with "-tls=/path" meaning load key and cert from "/path/privkey" and "/path/cert". WDYT? (see updated branch)

Good point about the config keys, too. Since ACME is the real feature (in principle the only Let's Encrypt-specific thing is the built-in Directory URL) I thought this might work:

    TLS:
      Certificate: "" 
      Key: "" 
      Insecure: false
      ACME:
        # Obtain certificates automatically for ExternalURL domains                                                                                                                                                                                                                                                                                                                                                                                       
        # using an ACME server and http-01 validation.                                                                                                                                                                                                                                                                                                                                                                                                    
        #                                                                                                                                                                                                                                                                                                                                                                                                                                                 
        # To use Let's Encrypt, specify "LE".  To use the Let's                                                                                                                                                                                                                                                                                                                                                                                           
        # Encrypt staging environment, specify "LE-staging".  To use a                                                                                                                                                                                                                                                                                                                                                                                    
        # different ACME server, specify the full directory URL                                                                                                                                                                                                                                                                                                                                                                                           
        # ("https://...").                                                                                                                                                                                                                                                                                                                                                                                                                                
        #                                                                                                                                                                                                                                                                                                                                                                                                                                                 
        # Note: this feature is not yet implemented in released                                                                                                                                                                                                                                                                                                                                                                                           
        # versions, only in the alpha/prerelease arvados-server-easy                                                                                                                                                                                                                                                                                                                                                                                      
        # package.                                                                                                                                                                                                                                                                                                                                                                                                                                        
        #                                                                                                                                                                                                                                                                                                                                                                                                                                                 
        # Implies agreement with the server's terms of service.                                                                                                                                                                                                                                                                                                                                                                                           
        Server: "" 

So a production example would be just

    TLS:
      ACME:
        Server: LE
Some of your potholes are addressed in #17344:
  • postgres not installed => early check for this, and the doc page says to install it first
  • nginx taking port 80 => disable the default debian placeholder site if that appears to be the only nginx config

Keep clients connecting to 0.0.0.0:9010 is one I hadn't noticed, but makes sense. I suppose we need to either use the "external client" flag, or put keepstore on the external interface instead. I've updated #17344 to ask for this fix.

16552-autocert @ 2f0c775a9e1ab8c3abdd94c854326fab771c4b5e -- developer-run-tests: #3235 (wb1 failed)

Actions #25

Updated by Lucas Di Pentima 7 months ago

Tom Clegg wrote in #note-24:

The -tls=acme option is for when you want something like acmetool to be responsible for obtaining certificates and putting them in /var/lib/acme/. The idea is to let you get real certs, inject them with a bind mount, and test the arvados package/boot stuff in a container without worrying about making the container reachable by LE validation, hitting LE rate limits, etc.

I'm thinking "-tls=auto" could be renamed to "-tls=acme", and the old "-tls=acme" could be replaced with "-tls=/path" meaning load key and cert from "/path/privkey" and "/path/cert". WDYT? (see updated branch)

Looks better, thanks!

Good point about the config keys, too. Since ACME is the real feature (in principle the only Let's Encrypt-specific thing is the built-in Directory URL) I thought this might work:
16552-autocert @ 2f0c775a9e1ab8c3abdd94c854326fab771c4b5e -- developer-run-tests: #3235 (wb1 failed)

This LGTM, thanks!

Actions #26

Updated by Tom Clegg 7 months ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados-private:commit:arvados|3c87fb14f48b78d30142f12c8cb855dba92c926d.

Actions #27

Updated by Peter Amstutz about 2 months ago

  • Release set to 47
Actions

Also available in: Atom PDF