[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File search progress: database review and question on triggers

From: Pierre Neidhardt
Subject: Re: File search progress: database review and question on triggers
Date: Thu, 13 Aug 2020 12:04:49 +0200

Hi Ricardo,

See my recent email with the new SQLite benchmark: I can now generate
the whole database in under 30 seconds, so about the same order of
magnitude than a naive text database (which has a less data).

SQLite pattern search queries are extremely fast (<0.1s) and cover all
examples named so far:

- exact basename match
- partial path match
- pattern match (e.g. "/include/%foo%")

The only thing that's missing is regexp support.  I'll see what we can do.

Considering this, I find that leveraging SQLite is very attractive at
this point:

- Fast (fastest?).
- Low on memory.
- No need to come up with our own data format.
- Less work, no need to reinvent the wheel.

>> - It does not cover the case where I don't know the basename, e.g. if I'm
>>   looking for a FOO header file my query would look like "/include/.*foo.*".
> I think this is a rather rare use case, which in my opinion doesn’t
> justify forgoing the use of a smart data structure.

I don't find it so rare, actually.  I've used filesearch (on other OSes)
to find

- Emacs packages where I didn't know the exact name of the .el.
- TeXlive packages (every time you get a font a .sty error)
- Shared libraries (.so of which I didn't know the exact name)
- include files as above
- Your favourite programming language package...

> It would *still* be possible to use the prefix tree for this kind of
> search, but it would certainly not be optimal. (E.g. by searching up
> to the first wildcard, and then searching each resulting sub-tree up
> to the first non-wildcard, and resuming the search in all remaining
> sub-trees.)

But it's not a wildcard, it's a regexp.  Can we run full-path regexp queries
over a trie?

>> I believe it's important that the search be as general as possible.
> Search should be cheap and fast.

While very important, I would put more priority on exactness.  A search
which gives unreliable results (e.g. returns nothing while there exists
a result) is a search that would deter many users from using it, in my


Pierre Neidhardt

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]