[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Feature request: testline(tl) (RFC)
From: |
Pádraig Brady |
Subject: |
Re: Feature request: testline(tl) (RFC) |
Date: |
Tue, 09 Dec 2014 22:38:51 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 |
On 09/12/14 22:20, V.Krishn wrote:
>
> Hi,
>
> Was reading about bloom filter,
> and came upon this example,
>
> http://troydhanson.github.io/misc/bloom.html
> ------
> The bf test program
>
> The program bf.c implements a Bloom filter. It can be used like,
>
> ./bf -n 16 members.txt test.txt
>
> Where the lines of members.txt are the true set members and the lines of
> test.txt will be tested for membership. Varying n shows how the error rate
> increases with smaller values of n.
> ------
>
> Source: https://github.com/troydhanson/misc
> code:
> https://raw.githubusercontent.com/troydhanson/misc/master/compression/bloom/bf.c
>
> REQUEST:
> Wondering if a simple implementation to test lines could be added to coreutils
> Features:
> 1. report if some lines missing (option to print)
> 2. option to print found lines
> 3. option to print missing lines
> 4. ....more logic posible...
>
> -------------
> Presently, I can achive the same using simple shell script by calling grep on
> each line or using `comm`
> But believe that method using bloom should be faster and result in a uniq and
> useful tool.
>
> Please ignore or guide if any similar util already exists.
>
Maybe we should keep the existing interfaces of grep, uniq, comm etc.
and use a bloom filter _internally_ if appropriate.
cheers,
Pádraig.