[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFE: uniq --sequential
From: |
Pádraig Brady |
Subject: |
Re: RFE: uniq --sequential |
Date: |
Wed, 10 Jun 2015 22:32:07 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 |
On 10/06/15 22:04, Daiki Ueno wrote:
> Hello,
>
> I occasionally have to deal with sequential numbers which is largely
> contiguous, but contain gaps (e.g., Unicode code points).
>
> To detect gaps, I usually write a shell-script loop, which is not
> trivial. So, I thought that it would be handy if this is supported by
> coreutils, like this:
>
> $ { seq 1 10; seq 12 22; seq 26 34; } | uniq --sequential
> 1
> 12
> 26
>
> or, a more practical use-case:
>
> $ wc -l UnicodeData.txt
> 27268 UnicodeData.txt
> $ cut -f1 -d';' UnicodeData.txt | sed 's/^/0x/' | uniq --sequential | wc -l
> 612
>
> where contiguous numbers are treated as duplicates. I'm attaching a
> patch which implements this.
Thanks for the suggestion and especially the patch.
This is related to the merging of sort --key functionality into uniq
in the next major version of coreutils. That will give numeric comparison
functionality to uniq. Then this functionality could be added with
a --sequential[=interval] or maybe a --min-separation=2 option.
It seems like it could be quite useful with the --group option also.
thanks!
Pádraig.