parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

File divide to feed parallel


From: David
Subject: File divide to feed parallel
Date: Wed, 26 Mar 2014 16:32:38 -0400 (EDT)

ETL programs like Ab Initio know how to tell parallel processes to split up big files and process each part separately, even when the files are linefeed delimited (they all agree to search up (or down) for the dividing linefeed closest to N bytes down file).  Does anyone know of a utility that can split a file this way (without reading it sequentially)?  Is this in gnu parallel? 

It'd be nice to be able to take a list of mixed size files and divide them by size into N chunks of approximately equal lines, estimated using byte sizes and with an algorythm for searching for the record delimiter (linefeed) such that no records are lost.  Sort of a mixed input leveller for parallel loads.  If it is part of parallel, then parallel can launch processing for each chunk and to combine the chunks.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]