bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: signature method in locate


From: James Youngman
Subject: Re: signature method in locate
Date: Mon, 11 May 2009 23:08:21 +0100

On Mon, May 11, 2009 at 10:42 AM, Ondrej Bilka <address@hidden> wrote:
> On Mon, May 11, 2009 at 10:39:11AM +0200, James Youngman wrote:
>> On Mon, May 11, 2009 at 8:14 AM, Ondrej Bilka <address@hidden> wrote:
>> > Hello,
>> > Was signature method discused here? search gives only pgp.
>> > Its that you give to each letter bit create signature.
>> > When searching you first create signature from largest continuous part not 
>> > containing /. Then you can compare signature of each directory and 
>> > filename.
>> > There is risk it will be slowdown because you need to read more.
>>
>> I'm sorry, I did not understand what you were trying to express.
>> Would you try again with an example?
> Sure. Say you have files aaa ,baba, abcd and aaccee. Signatures are
> 10000, 11000, 11110 and 10101

I don't understand the mapping you are implying here.

> If we want find *ba*, it coresponds to 11000

But above, you indicated that 11000 represented baba.   I don't
understand what mapping or calculation you are trying to describe.

> and by comparing signatures we know that we can match baba or abcd but not 
> aaa and aaccee.
> It has problem that if path is long signature would consist from all 1

Are you suggesting some kind of Bloom filter?

> and we dont get any information. But we can split path to parts separated by 
> slashes and do it partwise. If pattern is a?bcd  ? can be / so we seek bcd.
> We know that /ab/bc/cd cant contain pattern because signatures 
> 11000,01100,00110 dont contain 01110.

As far as I can tell though, we've exchanged one variety of substring
matching for another.    But perhaps I have misunderstood something
about what you are trying to describe.  Are you talking about using
something like the Rabin-Karp method?

> Signature would be 4 bytes long by hash widechar%32.

Well max(x%32) == 11111 binary, so I guess I have some idea why all
your examples have 5 binary digits.   But I still have no idea what
you're really trying to describe, I'm afraid.

James.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]