parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Design of --header when using --pipe


From: Ole Tange
Subject: Design of --header when using --pipe
Date: Wed, 23 Nov 2011 21:46:17 +0100

I have seen others ask for it and now I have even had use for it
myself: A way to repeat a header for each block when using --pipe

If you are processing a big CSV-file and the first line is the column
names you want this line to be repeated for each block passed to a
parallel process.

The simple fix is just to assume that the header is a single line. But
I think we can do better than that.

I would like to at least be able to process these 4 types of headers:

* The CSV-header: A single line. Maybe extended to a given number of
lines that can be 1?

* A header that has multiple lines prepended with a special character:

% header1
% header2
data
data

* A header that has a symbol dividing header from body. E.g. \n\n in emails:

>From root@alpha.tange.dk Mon Apr 23 10:20:38 2007
Return-path: <root@alpha.tange.dk>
From: Anacron <root@alpha.tange.dk>
To: root@alpha.tange.dk
Subject: Anacron job 'cron.daily' on alpha
Message-Id: <E1Hftmo-0001Jn-D5@localhost>
Date: Mon, 23 Apr 2007 10:20:38 +0200

data
data

* A fixed length header in bytes, so --pipe can process binary data
with a fixed block length.

This header is 25 bytes.
This data is taking up 33 bytes.
This data is 33 bytes in length.
Thirty three bytes used for this
Space for this: 33 bytes incl \n

Do you have other data files with headers that would require different
treatment?


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]