Bug #20561
closedLog when files from input are being propagated to output in crunch-run finalization
Description
"Maximum container memory rss usage"
then nothing for almost 2 hours, then finishes up with
"copying /file.txt (200000 bytes)"
"maximum keepstore memory rss"
...
Completed
On further investigation.
The output collection has ~4400 files, but except for the one file that was reported as being copied, it looks like these are staged to an intermediate collection and then made to appear in the output directory, and then propagated to the output collection.
So it seems like it is doing something that causes it to iterate over each of the 4400 files, it only needs to take 1.5s to process each file for that to add up to nearly two hours.
The input consists of an array of 4400 files, each file is pulled from a different collection, so I think what is happening is that it is sequentially fetching 4400 collections with manifest text.
Things to do:
- Log that this is happening (print out each file being added)
- We don't actually need these files in the output at all, we should support a regex filter on what gets collected for the output collection and don't upload or propagate files that the user doesn't want. There's actually a really old ticket for this! #9964
Related issues