coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: csplit - split by content of field


From: Pádraig Brady
Subject: Re: csplit - split by content of field
Date: Wed, 06 Feb 2013 22:38:36 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 02/06/2013 10:09 PM, Assaf Gordon wrote:
Hello,

Attach is a patch that gives 'csplit' the ability to split files by content of 
a field.
A typical usage is:

     ## the "@1" pattern means "start a new file when field 1 changes"
     $ printf "A\nA\nB\nB\nB\nC\n" | csplit - @1 {*}
     $ wc -l xx*
     2 xx00
     3 xx01
     1 xx02
     6 total
     $ head xx*
     ==> xx00 <==
     A
     A

     ==> xx01 <==
     B
     B
     B

     ==> xx02 <==
     C



This is just a proof of concept, and the pattern specification can be changed (I think 
"@N" doesn't conflict with any existing pattern).

The same can probably be achieved using other programs (awk comes to mind), but 
it won't be as simple and clean (with all of csplit's output features).

Let me know if you're willing to consider such addition.

Yes such a feature is useful, though maybe in conjuntion with uniq:
http://lists.gnu.org/archive/html/coreutils/2011-03/msg00000.html

So basically the proposal there is to support --suppress-matched
so that you could then do:

uniq -w1 --unique=separated --all-repeated=separated |
csplit --suppress-matched '/^$/' '{*}'

The caveat with that though is that uniq would
benefit from better field selection, which is
also on the TODO list.

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]