[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: signature method in locate
From: |
James Youngman |
Subject: |
Re: signature method in locate |
Date: |
Mon, 11 May 2009 23:08:21 +0100 |
On Mon, May 11, 2009 at 10:42 AM, Ondrej Bilka <address@hidden> wrote:
> On Mon, May 11, 2009 at 10:39:11AM +0200, James Youngman wrote:
>> On Mon, May 11, 2009 at 8:14 AM, Ondrej Bilka <address@hidden> wrote:
>> > Hello,
>> > Was signature method discused here? search gives only pgp.
>> > Its that you give to each letter bit create signature.
>> > When searching you first create signature from largest continuous part not
>> > containing /. Then you can compare signature of each directory and
>> > filename.
>> > There is risk it will be slowdown because you need to read more.
>>
>> I'm sorry, I did not understand what you were trying to express.
>> Would you try again with an example?
> Sure. Say you have files aaa ,baba, abcd and aaccee. Signatures are
> 10000, 11000, 11110 and 10101
I don't understand the mapping you are implying here.
> If we want find *ba*, it coresponds to 11000
But above, you indicated that 11000 represented baba. I don't
understand what mapping or calculation you are trying to describe.
> and by comparing signatures we know that we can match baba or abcd but not
> aaa and aaccee.
> It has problem that if path is long signature would consist from all 1
Are you suggesting some kind of Bloom filter?
> and we dont get any information. But we can split path to parts separated by
> slashes and do it partwise. If pattern is a?bcd ? can be / so we seek bcd.
> We know that /ab/bc/cd cant contain pattern because signatures
> 11000,01100,00110 dont contain 01110.
As far as I can tell though, we've exchanged one variety of substring
matching for another. But perhaps I have misunderstood something
about what you are trying to describe. Are you talking about using
something like the Rabin-Karp method?
> Signature would be 4 bytes long by hash widechar%32.
Well max(x%32) == 11111 binary, so I guess I have some idea why all
your examples have 5 binary digits. But I still have no idea what
you're really trying to describe, I'm afraid.
James.