Bug #20223
closedExcessive memory usage with huge containers
Updated by Peter Amstutz almost 2 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz almost 2 years ago
- Description updated (diff)
20223-container-bloat @ 1ba4c76f70f0f3d68e01fc548b0b7de0d43723ca
We were fixated on update_priority
but it looks like the thing that we were missing was that handle_completed
can also load full container + container request records.
In the end the solution seems to be to simply optimize the queries for memory usage. I removed the use of includes
and replaced it with either loading records one at a time or using pluck
to grab the one field that we actually cared about (avoiding most of the ActiveRecord machinery entirely).
Even with all of that it looks like this operation still took 43 seconds to complete, but the good news is that I didn't see massive bloat in the ruby memory footprint.
Also, when all registered workflows are refreshed so that the workflow record is the wrapper workflow only, this should be much quicker, as the main overhead here is deserializing the workflow JSON content kept in the "mounts" field.
Updated by Lucas Di Pentima almost 2 years ago
Changes LGTM.
I think I found another case where using .pluck()
could be beneficial:
model/container_request.rb:L355
-- Theold_container
var will only be used to get thelog
field.
Updated by Peter Amstutz almost 2 years ago
- Subject changed from Excessive time updating container state for parent container of workflow process tree to Excessive memory usage with huge containers
Updated by Peter Amstutz almost 2 years ago
I following up on this, it looks like trash_sweep
and used_by
are also problems.
Updated by Peter Amstutz almost 2 years ago
20223-trash-sweep @ effb6e13d6a4d024243138cacfeb582efba5e24a
Updated by Lucas Di Pentima almost 2 years ago
- I think we can use
pluck(:uuid)
on collections_controller instead ofin_batches()
saving a lot of memory in the process. - All the
in_batches(of: 15)
calls makes me want to have a config knob that we can tune. Not sure if this would be a blocker for this to get merged, though.
Updated by Peter Amstutz almost 2 years ago
20223-trash-sweep @ af90b8e123e227fe2c35902e3a80f49b15e6d067
Make used_by use pluck
(and select
in one case) instead of loading in batches, which should be way faster and more efficient.
Updated by Lucas Di Pentima almost 2 years ago
I think we can even use pluck()
in the multi-column case. I read that pluck()
supported only 1 arg on rails 3 but since rails 4 it can support multiple args. https://apidock.com/rails/v5.2.3/ActiveRecord/Calculations/pluck
Updated by Peter Amstutz almost 2 years ago
Re-ran the failing job
developer-run-tests-apps-workbench-integration: #3819 /console
I think I'm happy with using select()
here instead of pluck, but good to know if we need to optimize other column loads.
Updated by Lucas Di Pentima almost 2 years ago
Yes, I think in the case of collect
vs pluck
, using the latter also saves CPU cycles.
Either way, this LGTM.
Updated by Peter Amstutz almost 2 years ago
- Status changed from In Progress to Resolved