Story #6647

Choose an option to make better service discoveries in deployments

Added by Nico C├ęsar over 4 years ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
07/16/2015
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

One of the main pain points is that a admin user has to add the new resource to the API server in order to be active.

For example this is how it's done for the keepproxy:

~$ prefix=`arv --format=uuid user current | cut -d- -f1`
~$ echo "Site prefix is '$prefix'" 
~$ read -rd $'\000' keepservice <<EOF; arv keep_service create --keep-service "$keepservice" 
{
 "service_host":"keep.$prefix.your.domain",
 "service_port":443,
 "service_ssl_flag":true,
 "service_type":"proxy" 
}
EOF

but this requires a fully functional arvados python SDK with my admin user working somewhere (bringing that up and running also takes some manual steps).
I think this is one of the bigger problem for the coordination of a cluster creations: base resources (like keep servers, workbenches and shells) depend on authenticated users.

I envision several options here:

1) create a service discovery infrastructure with Consul. https://www.consul.io/   where the base resource authenticates and brings up the config. initializes itself and broadcast that is ready via Consul too (that's how API gets it's notification) . the API server trust this resource because it trust consul authentication 
2) create an extra field in API that "enable" that has a false value initially.  This is created with an anonymous token,   then the admin changes it to true from the workbench. The service will have the code to self register instead of usign arv cli
3) have an ADMIN user with known credentials when the API server is brought up. then we can puppetize the manual step

My thoughts: Number 3) is the easiest to implement with touching very little of our codebase, is pretty straight forward but looks like a bandaid. maybe is good for an initial approach. Number 2 requires minimal codebase changes and it only works when the workbench is already working. number 1 is my favorite because of the potential in scalability but introduces new infrastructure element plus trusting authentication of consul. On the long term it will pay it's dividends, at least for Curoverse, I don't know for people trying to install their own cluster how it will work out. I also don't want to have a steeper learning curve for the people trying to install it.

History

#1 Updated by Tom Morris 3 months ago

  • Target version set to To Be Groomed

Also available in: Atom PDF