Bug #22414
closedMaxConcurrentRailsRequests default is too small
Description
MaxConcurrentRailsRequests is currently set to 8 in the config defaults.
Because we haven't addressed the more general solution of #21287, this is small enough that loading a process page will cause timeouts, because the concurrent requests to fetch log files will fill up the request queue, preventing keep-web from being able to call back to the API server to validate the token.
The minimum concurrent requests should be doubled to at least 16.
Updated by Peter Amstutz about 1 month ago
- Related to Bug #21287: Binning and throttling incoming and outgoing requests added
Updated by Peter Amstutz about 1 month ago
- Target version set to Development 2025-01-22
Updated by Tom Clegg 13 days ago
22414-max-rails-requests @ 0896fc10c355653067d840cdbcb3d9938dde7b82 -- developer-run-tests: #4595
retry remainder run-tests-remainder: #4868
Updated by Brett Smith 12 days ago
Peter wants this, the branch does it, lgtm I guess.
But like, I don't know, wouldn't it be cool to reconfigure tordo and do a little scale testing before we make the change in main? Even accepting that the change will address the issue at hand, it would be good to try a few other things like running a workflow and make sure there are no obvious bad consequences. I'm especially concerned about this given that we just merged #22349 which completely changes our RailsAPI deployment, and while I'm convinced it's reasonably mature we don't have a lot of real-world experience with how it scales yet.
Updated by Brett Smith 8 days ago
Interestingly, tordo already has this set to 16. Still, I started a new workflow tordo-xvhdp-hr5370pocazfy67 and watched the logs for it the whole time. It's not done but it's past the most parallel part of the workflow, and that ran without issue, so I'm confident enough to say this is good to merge. Thanks.
Updated by Brett Smith 8 days ago
You expressed interest in a snapshot of resource usage. Right now, with the workflow still running but less parallel, systemctl status arvados-railsapi
reports:
Tasks: 52 (limit: 9222) Memory: 802.3M CPU: 54min 9.848s [over 20 hours of clock time]
A selection from systemd-cgtop
shows it at the top of Arvados services, but basically where I would expect it to be given it's Rails and not Go:
system.slice/arvados-railsapi.service 801.5M system.slice/keep-balance.service 619.5M system.slice/arvados-controller.service 436.2M system.slice/arvados-dispatch-cloud.service 166.6M
So I don't see any immediate cause for concern with this setting.
Updated by Tom Clegg 6 days ago
- Status changed from New to Resolved
Applied in changeset arvados|4b4f89a7eff6bf35799e9ef6d3857ee43fd4e035.