[API] Support multiple load-balanced API server nodes
Background: the Rails API server application becomes a performance bottleneck during heavy load, e.g., when hundreds of containers/nodes are running. There are some ways to respond to this -- use a bigger/faster machine, adjust logging configs, move postgresql to a different machine -- but it would be much better if the operator could add API server nodes to increase capacity. However, there are some parts of the code base that assume there's only one API server.
In this issue, we remove those barriers, so a site admin can safely add and remove additional API servers and route traffic to them with a load balancer.
(However, multi-API-server installations are not expected to support crunch1 jobs.)Known/suspected issues:
- Job validation code assumes git repositories are stored in the local filesystem (todo: confirm this only affects crunch1)
- Audit log cleanup code uses flock() to avoid wastefully running concurrent cleanup threads (todo: confirm concurrent cleanup threads are harmless, and/or use a database lock instead)
- Sample DNS update scripts (triggered by "node ping") assume the API host is the DNS server (todo: offer a sample DNS update strategy suitable for multiple nodes).
#1 Updated by Lucas Di Pentima 7 months ago
Some observations/questions:Potential issues:
- RequestIDs: Is this solved by a smart load balancer?
- Multiple sweep trashed objects processes on every API server
- Is this configuration thing?
- Should be split that code into a separate service?
- How about the logs table being read from several api instances?
- OK to waste some work due to every apiserver running a trash/log sweeping thread
- Load balancer must be configured to route all node ping requests to a single API server which is also the DNS server
- All API servers are shut down while any API server is being upgraded
- API servers are not aware of anything like "my ID" or what other API servers are running