coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Feature request: testline(tl) (RFC)


From: V.Krishn
Subject: Re: Feature request: testline(tl) (RFC)
Date: Wed, 10 Dec 2014 04:48:19 +0530
User-agent: KMail/1.13.7 (Linux/3.9.6-64; KDE/4.8.4; x86_64; ; )

> On 09/12/14 22:20, V.Krishn wrote:
> > Hi,
> > 
> > Was reading about bloom filter,
> > and came upon this example,
> > 
> > http://troydhanson.github.io/misc/bloom.html
> > ------
> > The bf test program
> > 
> > The program bf.c implements a Bloom filter. It can be used like,
> > 
> > ./bf -n 16 members.txt test.txt
> > 
> > Where the lines of members.txt are the true set members and the lines of
> > test.txt will be tested for membership. Varying n shows how the error
> > rate increases with smaller values of n.
> > ------
> > 
> > Source: https://github.com/troydhanson/misc
> > code:
> > https://raw.githubusercontent.com/troydhanson/misc/master/compression/blo
> > om/bf.c
> > 
> > REQUEST:
> > Wondering if a simple implementation to test lines could be added to
> > coreutils Features:
> > 1. report if some lines missing (option to print)
> > 2. option to print found lines
> > 3. option to print missing lines
> > 4. ....more logic posible...
> > 
> > -------------
> > Presently, I can achive the same using simple shell script by calling
> > grep on each line or using `comm`
> > But believe that method using bloom should be faster and result in a uniq
> > and useful tool.
> > 
> > Please ignore or guide if any similar util already exists.
> 
> Maybe we should keep the existing interfaces of grep, uniq, comm etc.
> and use a bloom filter _internally_ if appropriate.

Such internal use should be explicit options in tools like grep, uniq, comm 
etc and not set by default.
eg. comm --use-bloom <bloom options>
Reasons: using hashes has its on pros/cons and should not be a surprise by 
making it default.

-- 
Regards.
V.Krishn



reply via email to

[Prev in Thread] Current Thread [Next in Thread]