Project

General

Profile

Actions

Idea #17667

closed

Install docs explains InternalURL / ExternalURL, private networks & split DNS

Added by Peter Amstutz almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Documentation
Target version:
Start date:
12/06/2021
Due date:
Story points:
-
Release relationship:
Auto

Description

The externalURL is not explained on the config reference page, should probably be added there. The InternalURL is explained accurately there.

Defaults for internalURL should be added to the config reference, with our standard ports for the services. This will also mean we have defaults for this, for the first time!

The private networks/split DNS explanation could at the bottom of the 'planning and prerequesites' page in the docs.


Subtasks 1 (1 open0 closed)

Task #17669: ReviewIn ProgressTom Clegg12/06/2021Actions

Related issues

Related to Arvados - Feature #18563: Simplify/streamline InternalURLs/ExternalURL situationNewActions
Related to Arvados - Idea #16561: Add "Listen" to Services configResolvedTom Clegg06/24/2022Actions
Actions #1

Updated by Peter Amstutz almost 3 years ago

  • Assigned To set to Nico César
Actions #2

Updated by Nico César almost 3 years ago

  • Status changed from New to In Progress
Actions #3

Updated by Ward Vandewege almost 3 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2021-05-26 sprint to 2021-06-09 sprint
Actions #5

Updated by Peter Amstutz almost 3 years ago

  • Assigned To changed from Nico César to Ward Vandewege
Actions #6

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2021-06-09 sprint to 2021-06-23 sprint
Actions #7

Updated by Ward Vandewege almost 3 years ago

  • Target version changed from 2021-06-23 sprint to 2021-07-07 sprint
Actions #8

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2021-07-07 sprint to 2021-07-21 sprint
Actions #9

Updated by Ward Vandewege almost 3 years ago

From our documentation:

# In each of the service sections below, the keys under
# InternalURLs are the endpoints where the service should be
# listening, and reachable from other hosts in the cluster.

InternalURLs is used by the Arvados services to determine the IP/port they should listen on. But it is also used by the other services in the cluster as the address of the relevant service.

ExternalURL is used by the service to advertise the IP address/port or hostname/port combination it listens at, for access from outside the cluster.

For example:

Services:
  Controller:
    InternalURLs: "http://localhost:9004": {}
    ExternalURL: "https://xxxx1.arvadosapi.com" 
  RailsAPI:
    InternalURLs: "http://localhost:8000": {}
    ExternalURL: "-" 
  Keepstore:
    InternalURLs:
      "http://keep0.xxxx1.arvadosapi.com:25107":
        Rendezvous: rendezvousvalu1
      "http://keep1.xxxx1.arvadosapi.com:25107":
        Rendezvous: rendezvousvalu2
  Workbench1:
    ExternalURL: "https://workbench.xxxx1.arvadosapi.com" 
  Workbench2:
    ExternalURL: "https://workbench2.xxxx1.arvadosapi.com" 
  Keepbalance:
    InternalURLs:
      "http://xxxx1.arvadosapi.com:9005/": {}
  DispatchCloud:
    ExternalURL: "-" 
    InternalURLs:
      "http://xxxx1.arvadosapi.com:9006": {}
  Keepproxy:
    InternalURLs:
     "http://localhost:25107": {}
    ExternalURL: "https://keep.xxxx1.arvadosapi.com" 
  Websocket:
    InternalURLs:
      "http://127.0.0.1:9003": {}
    ExternalURL: "wss://ws.xxxx1.arvadosapi.com/websocket" 
  WebDAV:
    InternalURLs:
      "http://127.0.0.1:9200": {}
    ExternalURL: "https://collections.xxxx1.arvadosapi.com/" 
  WebDAVDownload:
    ExternalURL: "https://download.xxxx1.arvadosapi.com/" 
  WebShell:
    InternalURLs: {}
    ExternalURL: "https://webshell.xxxx1.arvadosapi.com/" 

In this configuration example, arvados-controller listens on localhost port 9004, and it is externally accessible at https://xxxx1.arvadosapi.com. This is possible because there is an nginx reverse proxy that sits in front of the `arvados-controller` process and maps the external hostname to localhost:9004 on the machine that runs arvados-controller.

The xxxx1.arvadosapi.com hostname must resolve from outside the cluster, but also from the inside. A split DNS setup can be handy here; routing internal traffic to the external IP address for the API server tends to be bad from a security perspective. In a cloud environment, it can also become expensive as all traffic to official IP addresses tends to cost money, even when it originates from inside the same cloud provider.

RailsAPI is configured differently: the ExternalURL is set to - which means it is not applicable (the RailsAPI process is not directly accessible from outside). The InternalURL is set to http://localhost:8000, and this hostname:port combination is used by arvados-controller to access RailsAPI. Note that because localhost is used in this example, this implies that arvados-controller and RailsAPI live on the same host in this configuration. RailsAPI is a rails process, and the hostname/port combination it listens on is actually defined in the nginx/Passenger connection, so in this case InternalURL is only used by e.g. arvados-controller to find the RailsAPI.

The Keepstore entry has no ExternalURL defined, which is equivalent to it being set to -. This is because keepstores should not be directly accessible from outside the cluster. The InternalURL values are set to keep0.xxxx1.arvadosapi.com:25107 and keep1.xxxx1.arvadosapi.com:25107, which implies that those 2 hostnames must resolve to the correct hosts inside the Arvados cluster. The keepstore processes on those hosts will resolve that DNS entry to an IP address and listen on that address at port 25107. Simultaneously, any services inside the Arvados cluster that need to talk to the keepstores will resolve the keep0.xxxx1.arvadosapi.com hostname to an IP address that must be routable to the host that runs that keepstore process.

The Workbench1 and Workbench2 entries have no InternalURL defined, because they don't need one. Workbench1 is a Rails application; the IP/port it listens on is configured in the nginx/Passenger configuration. Workbench2 is a set of flat files served by nginx.
The values for ExternalURL must resolve to the correct host from outside the cluster so that clients can access these services. Typically these hostnames will point to an nginx reverse proxy that sits in front of Workbench1, and that can serve the files for Workbench2.

The Keepbalance entry has no ExternalURL because this service should not be accessible from outside the cluster. The InternalURL is set to http://xxxx1.arvadosapi.com:9005/, and is only used to export Prometheus metrics. The keep-balance program will resolve xxxx1.arvadosapi.com to an IP address and listen on that IP address at port 9005. The process that polls those metrics must be able to resolve xxxx1.arvadosapi.com to an IP address local to the Arvados cluster, and contact that IP address at port 9005.

The Dispatchcloud entry follows the same logic as Keepbalance.

The Keepproxy entry lists http://localhost:25107 as its InteralURL. The keepproxy process listens on that hostname/port. There is an nginx reverse process that sits in front of it that maps the https://keep.xxxx1.arvadosapi.com hostname to localhost:25107. This means that keep.xxxx1.arvadosapi.com must resolve to the correct IP for this nginx reverse proxy from outside the cluster.

The Websocket entry is similar to the Keepproxy entry. Note that the ExternalURL is a websockets URL. The ws.xxxx1.arvadosapi.com hostname must resolve from outside the cluster. To make the websocket service accessible from inside (handy from shell nodes, for example), the ws.xxxx1.arvadosapi.com must also resolve correctly from inside. See the node about split DNS, above.

The WebDAV entry is similar to the Keepproxy entry. It is used by the keep-web program to provide a WebDAV endpoint.

The WebDAVDownload entry only has an ExternalURL defined. This is an additional endpoint provided by keep-web for file download. The download.xxxx1.arvadosapi.com hostname must resolve from outside the cluster.

The WebShell entry defines an empty list for InternalURLs; this is the same as omitting the InternalURLs key. It is not needed because the webshell functionality is provided by third party tools, and configured separately from the Arvados configuration in the nginx and shellinabox configuration files. The ExternalURL field is set to https://webshell.xxxx1.arvadosapi.com/ which must resolve from outside the cluster.

Actions #10

Updated by Tom Clegg almost 3 years ago

ExternalURL
  • Instructs clients/applications how to connect to Arvados services
  • when used outside the cluster, must resolve to [a load balancer or proxy which calls through to] the host where the service is running
  • when used inside the cluster, should resolve to the internal IP address of the host where the service is running
    • otherwise traffic is slower / more expensive
    • this only matters for controller and websocket (and webdav, if any applications are using it inside the cluster)
    • shouldn't matter for keepproxy because internal clients should always use keepstore directly
  • not needed for certain services (railsapi, keep-balance, arvados-dispatch-cloud) because clients/applications do not connect to them
  • not needed for keepstore because "internal" clients are told to connect to keepstore's internalURLs, and "external" clients are told to connect to keepproxy
InternalURLs
  • Instructs server components how to connect to other server components within the cluster
  • when used inside the cluster, must resolve to the host where the service is running
  • does not need to be resolvable/routable from outside the cluster
  • completely ignored in some cases because we don't (yet) use the config file to control routing -- instead we require the operator to type the same info into an nginx.conf file
  • can be "localhost" (or 0.0.0.0) in some cases
    • railsapi, because it only accepts requests from controller, which runs on the same host
    • workbench1 + 2, because it only accepts requests forwarded by the nginx proxy which runs on the same host
  • keepstore is unusual in that clients/applications that are identified as "internal" are told to connect to the InternalURLs
Currently, several services use InternalURLs to figure out the desired listening address at startup, which has a couple of problems that (I think) are fixable like so:
  • If ListenAddress is set, listen on that address
    • If keepstore, get the ARVADOS_SERVICE_INTERNAL_URL env var and choose the corresponding entry from InternalURLs to determine which keepstore volumes, to use (if not set, and there's more than one keepstore in the config, then fail)
  • Otherwise, if one of the InternalURLs has a host:port part that is usable as a listening address, listen on that host:port.
    • If ARVADOS_SERVICE_INTERNAL_URL env var is set, choose the corresponding InternalURLs entry. (This makes it possible to run two keepstore services on a single node, e.g., for testing/demo.)
  • Otherwise, don't start the service.
tbd: Still seems like there's a better way to handle keepstore in the kubernetes scenario where the hostname isn't known when creating the config file.
  • perhaps: If ListenAddress is set, and ARVADOS_SERVICE_INTERNAL_URL is not set, and there's more than one keepstore in the config but AccessViaHosts is empty for all volumes, then it doesn't matter which InternalURLs entry we think is "ours" -- so listen on ListenAddress and use all volumes.
tbd:
  • Would it be an error to use ListenAddress without ARVADOS_SERVICE_INTERNAL_URL if InternalURLs has both http and https entries? With this config we wouldn't know whether to listen for plain http or https:
    Keepstore:
      ListenAddress: "0.0.0.0:1234" 
      InternalURLs:
        "http://host1:1234/": {}
        "https://host2:1234/": {}
    
Actions #11

Updated by Tom Clegg almost 3 years ago

17667-listen-address @ 57f5d50c3315348e3b5be06b5f620532e850566b -- developer-run-tests: #2577
  • If ListenAddress config is set ("addr:port"), bind to that addr:port (and choose http/https scheme indicated by InternalURLs entries, which must all use the same scheme).
  • Otherwise, if ARVADOS_SERVICE_INTERNAL_URL is set in environment, bind to that addr:port.
  • Otherwise, try binding each InternalURLs addr:port until one works.
  • When a service needs to know its own InternalURL (e.g., interpreting AccessViaHosts):
    • If ARVADOS_SERVICE_INTERNAL_URL is set, use that
    • Otherwise, if ListenAddress is set, use that (even if it doesn't match any entry in InternalURLs) -- note this means if you want to use both AccessViaHosts and ListenAddress, you also need to set ARVADOS_SERVICE_INTERNAL_URL.
    • Otherwise, use the InternalURLs entry that worked

This only changes the lib/service module, which is used by controller, dispatchcloud, health, keep-balance, keepstore, and ws. We also need to update arv-git-http, keepproxy, and keep-web to use lib/service so they work the same way.

Actions #12

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2021-07-21 sprint to 2021-08-04 sprint
Actions #13

Updated by Ward Vandewege almost 3 years ago

Tom Clegg wrote:

17667-listen-address @ 57f5d50c3315348e3b5be06b5f620532e850566b -- developer-run-tests: #2577
  • If ListenAddress config is set ("addr:port"), bind to that addr:port

Great.

(and choose http/https scheme indicated by InternalURLs entries, which must all use the same scheme).

This feels complicated/brittle. Can we be more explicit? If we need to define a scheme, why not put it in ListenAddress? I can easily see a scenario where InternalURL would have a different scheme than what we want the service itself to do (e.g. SSL termination at something that sits in front of the service).

  • Otherwise, if ARVADOS_SERVICE_INTERNAL_URL is set in environment, bind to that addr:port.

This is for backwards compatibility only, right?

  • Otherwise, try binding each InternalURLs addr:port until one works.

This is for backwards compatibility only, right? And only necessary for Keepstore (or any future service where we have multiple instances with unique configuration)?

  • When a service needs to know its own InternalURL (e.g., interpreting AccessViaHosts):

So this is for Keepstores only, then?

  • If ARVADOS_SERVICE_INTERNAL_URL is set, use that
  • Otherwise, if ListenAddress is set, use that (even if it doesn't match any entry in InternalURLs) -- note this means if you want to use both AccessViaHosts and ListenAddress, you also need to set ARVADOS_SERVICE_INTERNAL_URL.
  • Otherwise, use the InternalURLs entry that worked

This feels very complicated. How much of this is because we don't have another way for a keepstore to discover which part of the config to use? Could we solve that problem differently and greatly simplify this logic?

This only changes the lib/service module, which is used by controller, dispatchcloud, health, keep-balance, keepstore, and ws. We also need to update arv-git-http, keepproxy, and keep-web to use lib/service so they work the same way.

Let's talk a bit more about this. Is there a way to simplify things? I'm particularly confused about the utility of InternalURL (in the non-Keepstore case) after we introduce ListenAddress. Can we explicitly list the ways in which that variable would be used for each of our services?

It seems to me that InternalURL would be valuable if split DNS is not an option, but as we have learned with the whole nginx geo thing, it may not be so easy or practical for a service to determine if it needs to use the internal address vs the external address, in which case, ... does InternalURL serve a purpose anymore?

Actions #14

Updated by Tom Clegg over 2 years ago

Ward Vandewege wrote:

Tom Clegg wrote:

(and choose http/https scheme indicated by InternalURLs entries, which must all use the same scheme).

This feels complicated/brittle. Can we be more explicit? If we need to define a scheme, why not put it in ListenAddress? I can easily see a scenario where InternalURL would have a different scheme than what we want the service itself to do (e.g. SSL termination at something that sits in front of the service).

We could call it ListenURL: "http://localhost:1234" ...

  • Otherwise, if ARVADOS_SERVICE_INTERNAL_URL is set in environment, bind to that addr:port.

This is for backwards compatibility only, right?

It makes it possible to run multiple keepstore processes on the same host that access different volumes, which we do in tests.

  • Otherwise, try binding each InternalURLs addr:port until one works.

This is for backwards compatibility only, right? And only necessary for Keepstore (or any future service where we have multiple instances with unique configuration)?

It makes it possible to run keepstore out of the box (at least in setups where vhostnames resolve to interface addrs) without having to arrange custom env vars like "ARVADOS_SERVICE_INTERNAL_URL=http://this-vhost-name:1234" in each host's keepstore startup script.

It also makes it possible for each host to have all the programs installed but only run the ones that are actually used, like arvados-server-easy / arvados-server boot try to do. (If ListenAddress is set for a service, we need to run that service on all hosts because we don't know whether InternalURLs actually get routed to us.)

  • When a service needs to know its own InternalURL (e.g., interpreting AccessViaHosts):

So this is for Keepstores only, then?

Yes.

  • If ARVADOS_SERVICE_INTERNAL_URL is set, use that
  • Otherwise, if ListenAddress is set, use that (even if it doesn't match any entry in InternalURLs) -- note this means if you want to use both AccessViaHosts and ListenAddress, you also need to set ARVADOS_SERVICE_INTERNAL_URL.
  • Otherwise, use the InternalURLs entry that worked

This feels very complicated. How much of this is because we don't have another way for a keepstore to discover which part of the config to use? Could we solve that problem differently and greatly simplify this logic?

Yes, this is 100% about keepstore processes figuring out which AccessViaHosts entries they're supposed to use. (We might need something similar when we want to support running multiple dispatch processes, since they will also need to know which one is "me".)

Another possibility is for each service to try connecting to all the InternalURLs at startup and see which one loops back to itself -- but that could be a bit of a nightmare if there's an additional proxy involved that isn't necessarily configured/running yet while keepstore is starting up.

I guess it's complicated because we're trying to accommodate all possible setups (like multiple keepstores on same host) but also avoid extra configuration tasks in easy cases (one keepstore per host, hostnames resolve to network interface addresses).

Perhaps we should draft the admin-facing docs and see if it still seems complicated. Something like
  • If you have DNS (or /etc/hosts) entries that resolve to your cluster hosts' network interface addresses (e.g., keep0 has IP addr 10.10.10.10 and keep0.zzzzz.example.com resolves to 10.10.10.10 on all cluster hosts), and you aren't trying to insert any gateways/proxies between Arvados components, just put an entry like http://keep0.zzzzz.example.com:25107/ in InternalURLs for each host where you run the service, and you're done.
  • Otherwise, customize your setup:
    • InternalURLs lists all the URLs internal Arvados components can use to connect to instances of the service
    • ListenAddress (or ListenURL) specifies the addr:port the service process should listen on -- typically 0.0.0.0:port, or, if you're inserting your own gateways/proxies in front of the internal Arvados services, perhaps localhost:port or 0.0.0.0:differentport.
    • If you use ListenAddress for keepstore, you must also set the ARVADOS_SERVICE_INTERNAL_URL env var to match one of the InternalURLs entries, so keepstore knows which AccessViaHosts entries to use.

This only changes the lib/service module, which is used by controller, dispatchcloud, health, keep-balance, keepstore, and ws. We also need to update arv-git-http, keepproxy, and keep-web to use lib/service so they work the same way.

Let's talk a bit more about this. Is there a way to simplify things? I'm particularly confused about the utility of InternalURL (in the non-Keepstore case) after we introduce ListenAddress. Can we explicitly list the ways in which that variable would be used for each of our services?

InternalURLs tells Arvados server components how to connect to other Arvados server components:
  • when keepproxy connects to keepstore, it connects to the addresses given in Services.Keepstore.InternalURLs.
  • when controller connects to railsAPI, it connects to the address given in Services.RailsAPI.InternalURLs.
  • when Nginx connects to keep-web, it connects to the address given in Services.WebDAV.InternalURLs (only in arvados-server boot, though -- the install guide doesn't have unified config for Nginx yet, it still instructs the operator to type the same information into an Nginx config file).

It seems to me that InternalURL would be valuable if split DNS is not an option, but as we have learned with the whole nginx geo thing, it may not be so easy or practical for a service to determine if it needs to use the internal address vs the external address, in which case, ... does InternalURL serve a purpose anymore?

There are two different contexts with two different (and related) meanings for external/internal, which is confusing.
  • InternalURLs / ExternalURL -- "internal" refers to internal communication between Arvados services, "external" refers to how clients connect to services
  • Internal client / external client -- "internal" refers to clients that are able to bypass keepproxy and connect directly to keepstore InternalURLs, which we special-case to avoid having to scale keepproxy.
Actions #15

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-08-04 sprint to 2021-08-18 sprint
Actions #16

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-08-18 sprint to 2021-09-01 sprint
Actions #17

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-09-01 sprint to 2021-09-15 sprint
Actions #18

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-09-15 sprint to 2021-09-29 sprint
Actions #19

Updated by Peter Amstutz over 2 years ago

  • Release set to 42
Actions #20

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-09-29 sprint to 2021-10-13 sprint
Actions #21

Updated by Peter Amstutz over 2 years ago

  • Release deleted (42)
Actions #22

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-10-13 sprint to 2021-10-27 sprint
Actions #23

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-10-27 sprint to 2021-11-10 sprint
Actions #24

Updated by Tom Clegg over 2 years ago

client destination target URL
Controller RailsAPI RailsAPI.InternalURLs
Nginx Controller Controller.InternalURLs
Nginx Keepproxy Keepproxy.InternalURLs
Nginx Keep-web WebDAV.InternalURLs
Keep client Keepstore Keepstore.InternalURLs
Keep client Keepproxy Keepproxy.ExternalURL
API client Controller $ARVADOS_API_HOST
Workbench2 Keep-web WebDAV.ExternalURL
Workbench2 Websocket Websocket.ExternalURL
arv-ws Websocket Websocket.ExternalURL
arv-mount Controller $ARVADOS_API_HOST
arv-mount Keepstore Keepstore.InternalURLs
arv-mount Keepproxy Keepproxy.ExternalURL
webdav client Keep-web WebDAV.ExternalURL
Keep-balance Keepstore Keepstore.InternalURLs
Prometheus Keep-balance Keepbalance.InternalURLs
Actions #25

Updated by Ward Vandewege over 2 years ago

  • Target version changed from 2021-11-10 sprint to 2021-11-24 sprint
Actions #26

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-11-24 sprint to 2021-12-08 sprint
Actions #28

Updated by Tom Clegg over 2 years ago

It seems backwards to me to describe InternalURLs as a listening address first. I think it would be better to say first that InternalURLs are destination addresses that clients use to connect to the service, and are also used by the service itself to figure out which address/port to listen on. (Of course this distinction will be more interesting when we decide on a good way to specify a different listening address, whether it's note-11 or something different.)

I think "The ExternalURL value is the URL that the service advertises as its own URL." could be removed -- the following sentence says the relevant thing better.

The first example (keep-balance using ClusterID.example.com:9005 as InternalURL) seems like a weird/unlikely example to lead with. Perhaps only use the more likely 127.0.0.1 and 0.0.0.0 variants, and maybe an example with a specific internal IP address or internal-looking hostname?

The table at the end is pretty hard to read. The distinction between $ARVADOS_API_HOST and ExternalURL is interesting but it might be more of a distraction in this context.

I wonder if this would be more useful if we rearrange it to highlight the differences between the services in context of writing a config file.

service ExternalURL required? InternalURLs must be reachable from other cluster nodes? InternalURLs are currently ignored completely and can be omitted?
railsapi no no no
controller yes no (all clients connect to ExternalURL; only Nginx connects to internal URLs) no
websocket yes no (...only Nginx) no
keepproxy yes no (...only Nginx) no
webdav yes no (...only Nginx) no
workbench1 yes no (...only Nginx) yes
workbench2 yes no (...only Nginx) yes
keepstore no yes no
keep-balance no no (only your prometheus server) no
crunch-dispatch-cloud no no (only your prometheus server) no

When InternalURLs do not need to be reachable from other nodes (i.e., everywhere except keepstore), it is easiest to use loopback addresses as InternalURLs, like http://127.0.0.1:9009.

IMO the main thing that makes this hard to explain/understand is that our "one config file for all components" work hasn't reached Nginx yet -- so we instruct the operator to copy the host/port from "InternalURLs" in the arvados config file into the server_name/listen lines in the Nginx/passenger config file, or vice versa.

From that perspective, the reason Workbench1 doesn't need any InternalURLs is that the only client that connects to Workbench1 (or needs to figure out which port to listen on) is Nginx itself, which ignores the entire arvados config and you therefore need to configure manually anyway. (Unless you're from the future and you're using arvados-server boot of course.)

It feels a bit weird to highlight the difference between workbench1 and controller, which is really just the fourth column in the table above ("doesn't really need to be correct"). But on the other hand it would be weird to tell people to do something that doesn't currently have any effect. Not sure which is worse.

I think it would be better to use "RailsAPI" more consistently rather than "API server", at least when it's in red-monotype.

Actions #29

Updated by Ward Vandewege over 2 years ago

Tom Clegg wrote:

It seems backwards to me to describe InternalURLs as a listening address first. I think it would be better to say first that InternalURLs are destination addresses that clients use to connect to the service, and are also used by the service itself to figure out which address/port to listen on. (Of course this distinction will be more interesting when we decide on a good way to specify a different listening address, whether it's note-11 or something different.)

Sure, reworded.

I think "The ExternalURL value is the URL that the service advertises as its own URL." could be removed -- the following sentence says the relevant thing better.

Done.

The first example (keep-balance using ClusterID.example.com:9005 as InternalURL) seems like a weird/unlikely example to lead with. Perhaps only use the more likely 127.0.0.1 and 0.0.0.0 variants, and maybe an example with a specific internal IP address or internal-looking hostname?

Fair enough. Of course this unlikely example was taken from a real live configuration :) I started with this one as a simple example. I've changed it to an AWS-style internal address.

The table at the end is pretty hard to read. The distinction between $ARVADOS_API_HOST and ExternalURL is interesting but it might be more of a distraction in this context.

OK.

I wonder if this would be more useful if we rearrange it to highlight the differences between the services in context of writing a config file.

service ExternalURL required? InternalURLs must be reachable from other cluster nodes? InternalURLs are currently ignored completely and can be omitted?
railsapi no no no
controller yes no (all clients connect to ExternalURL; only Nginx connects to internal URLs) no
websocket yes no (...only Nginx) no
keepproxy yes no (...only Nginx) no
webdav yes no (...only Nginx) no
workbench1 yes no (...only Nginx) yes
workbench2 yes no (...only Nginx) yes
keepstore no yes no
keep-balance no no (only your prometheus server) no
crunch-dispatch-cloud no no (only your prometheus server) no

When InternalURLs do not need to be reachable from other nodes (i.e., everywhere except keepstore), it is easiest to use loopback addresses as InternalURLs, like http://127.0.0.1:9009.

I've taken this table and used it, albeit with a bunch of modifications. Have a look and tell me what you think?

IMO the main thing that makes this hard to explain/understand is that our "one config file for all components" work hasn't reached Nginx yet -- so we instruct the operator to copy the host/port from "InternalURLs" in the arvados config file into the server_name/listen lines in the Nginx/passenger config file, or vice versa.

I think it's more than just that. Anyway, have a look at the changes I made.

From that perspective, the reason Workbench1 doesn't need any InternalURLs is that the only client that connects to Workbench1 (or needs to figure out which port to listen on) is Nginx itself, which ignores the entire arvados config and you therefore need to configure manually anyway. (Unless you're from the future and you're using arvados-server boot of course.)

OK, I've updated that phrasing, minus the part about 'manual configuration' which seems unrelated. It's just configured in a different configuration file, that has nothing to do with how configuration is done (manual or via configuration management).

It feels a bit weird to highlight the difference between workbench1 and controller, which is really just the fourth column in the table above ("doesn't really need to be correct"). But on the other hand it would be weird to tell people to do something that doesn't currently have any effect. Not sure which is worse.

Not sure I understand what you mean here?

I think it would be better to use "RailsAPI" more consistently rather than "API server", at least when it's in red-monotype.

Sure, done.

Ready for another look at 9d49af75f45c083a2752b58071072f383ca689b5 on branch 17667-doc-improvements

Actions #30

Updated by Tom Clegg over 2 years ago

Ward Vandewege wrote:

I've taken this table and used it, albeit with a bunch of modifications. Have a look and tell me what you think?

Yes, I think this is much better.

It feels a bit weird to highlight the difference between workbench1 and controller, which is really just the fourth column in the table above ("doesn't really need to be correct"). But on the other hand it would be weird to tell people to do something that doesn't currently have any effect. Not sure which is worse.

Not sure I understand what you mean here?

When "InternalURLs required?" is "no" it's because a) no piece of arvados connects to it yet, except health-check, and we don't tell you how to use health-check, and b) no piece of arvados knows how to bring up that service, except arvados-server boot, and that isn't production ready yet.

So while in principle workbench1 InternalURLs are a real thing, in practice it's probably better to leave them blank, rather than fill in values that have no way of being validated, because even if they're correct today they'll go stale over time without anyone noticing, and sooner or later they'll turn into a "how did this ever work" or "why doesn't this affect anything" mystery.

Ready for another look at 9d49af75f45c083a2752b58071072f383ca689b5 on branch 17667-doc-improvements

LGTM, thanks!

Actions #31

Updated by Ward Vandewege over 2 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved

Applied in changeset arvados-private:commit:arvados|2a1062755c5a83e765963c8dbfd223ebd61530cc.

Actions #32

Updated by Ward Vandewege over 2 years ago

  • Related to Feature #18563: Simplify/streamline InternalURLs/ExternalURL situation added
Actions #33

Updated by Ward Vandewege over 2 years ago

LGTM, thanks!

Thanks, merged. I also created #18563 to track improvements.

Actions #34

Updated by Tom Clegg over 2 years ago

  • Related to Idea #16561: Add "Listen" to Services config added
Actions #35

Updated by Lucas Di Pentima over 2 years ago

  • Release set to 48
Actions

Also available in: Atom PDF