[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File search progress: database review and question on triggers

From: Pierre Neidhardt
Subject: Re: File search progress: database review and question on triggers
Date: Thu, 13 Aug 2020 15:53:15 +0200

Arun Isaac <> writes:

>> - Or do you think SQLite patterns (using "%") would do for now?  As
>>   Mathieu pointed out, it's an unfortunate inconsistency with the rest of
>>   Guix.  But maybe regexp support can be added in a second stage.
> The inconsistency is unfortunate. Personally, I am in favor of dropping
> regexp support everywhere in Guix, and only having literal string
> search. But that is backward incompatible, and may be controversial.
> In this specific case of file search, we could use the sqlite like
> patterns, but not expose them to the user. For example, if the search
> query is "<query>", we search for the LIKE pattern "%<query>%". I think
> this addresses how users normally search for files. I don't think
> regexps add much value.

I agree.

> Full text search may not be relevant to file search. Full text search is
> more suited for natural language search involving such things as
> stemming algorithms.

Yes, but full text search brings us a few niceties here:

- Wildcards using the `*` character.  This fixes the unfamiliarity of `%`.

- "Automatic word permutations" (maybe not the right term).  "foo bar" and
  "bar foo" both match the same results!  

- Case insensitive, diacritic insensitive (e.g. "e" matches "É").

- Logic: we can do "(OR foo bar) AND (OR qux quuz)".

- Relevance ranking: results can be sorted by relevance, another problem
  we don't have to fix ourselves ;)

All the above is arguably more powerful and easier to use than regexp.
But even if no user ever bothers with the logic operators, the default
behaviour "just works" in the fashion of a search engine.

The main thing I don't know how to do is suffix matches (like "%foo").
With FTS, looking up "foo*" won't match "libfoo", which is problematic.
Any clue how to fix that?

Pierre Neidhardt

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]