[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: File divide to feed parallel
From: |
Ole Tange |
Subject: |
Re: File divide to feed parallel |
Date: |
Thu, 27 Mar 2014 09:51:06 +0100 |
On Wed, Mar 26, 2014 at 9:32 PM, David <dgpickett@aol.com> wrote:
> ETL programs like Ab Initio know how to tell parallel processes to split up
> big files and process each part separately, even when the files are linefeed
> delimited (they all agree to search up (or down) for the dividing linefeed
> closest to N bytes down file). Does anyone know of a utility that can split
> a file this way (without reading it sequentially)? Is this in gnu parallel?
GNU Parallel will do that except it will read it sequentially.
> It'd be nice to be able to take a list of mixed size files and divide them
> by size into N chunks of approximately equal lines, estimated using byte
> sizes and with an algorythm for searching for the record delimiter
> (linefeed) such that no records are lost. Sort of a mixed input leveller
> for parallel loads. If it is part of parallel, then parallel can launch
> processing for each chunk and to combine the chunks.
That is what --pipe does (except it reads sequentially):
cat files* | parallel --pipe --block 10m wc
/Ole