Actions
File splits¶
General approach.
For each each file segment (generally 1 segment/block):
- Fetch the assigned block.
- Determine the offset of the first record in the assigned block. (If it is ambiguous, check the previous block to see if there is a record split).
- Seek ahead to find the last record in the assigned block and determine where it ends (which may be on the next block).
- Generate a collection representing a subsection of the original file starting from the offset of the first record, and range incorporating the end of the last record.
- Insert header segment into file at the beginning if required.
- Feed the new collection to the target program via SDK or arv-mount.
Should be possible to do in a dedicated split step, or as a parallelization wrapper before running the real program.
Updated by Peter Amstutz almost 10 years ago · 1 revisions