parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File divide to feed parallel


From: Ole Tange
Subject: Re: File divide to feed parallel
Date: Thu, 27 Mar 2014 09:51:06 +0100

On Wed, Mar 26, 2014 at 9:32 PM, David <dgpickett@aol.com> wrote:
> ETL programs like Ab Initio know how to tell parallel processes to split up
> big files and process each part separately, even when the files are linefeed
> delimited (they all agree to search up (or down) for the dividing linefeed
> closest to N bytes down file).  Does anyone know of a utility that can split
> a file this way (without reading it sequentially)?  Is this in gnu parallel?

GNU Parallel will do that except it will read it sequentially.

> It'd be nice to be able to take a list of mixed size files and divide them
> by size into N chunks of approximately equal lines, estimated using byte
> sizes and with an algorythm for searching for the record delimiter
> (linefeed) such that no records are lost.  Sort of a mixed input leveller
> for parallel loads.  If it is part of parallel, then parallel can launch
> processing for each chunk and to combine the chunks.

That is what --pipe does (except it reads sequentially):

  cat files* | parallel --pipe --block 10m wc

/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]