guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File search progress: database review and question on triggers


From: Arun Isaac
Subject: Re: File search progress: database review and question on triggers
Date: Thu, 13 Aug 2020 20:44:08 +0530

> Yes, but full text search brings us a few niceties here:

These are nice features, but I don't know if all of them are useful for
file search. Normally, with Arch's pkgfile, I seach for some missing
header file, shared library, etc. Usually, I know the exact filename I
am looking for, or I know some prefix or suffix of the exact filename.

> - Case insensitive, diacritic insensitive (e.g. "e" matches "É").

Case insensitivity is quite useful. Most filenames are in lower case,
but there is always that one odd filename out there.

But filenames usually don't have diacritics. So, I'm not sure if
diacritic insensitivity is useful.

> All the above is arguably more powerful and easier to use than regexp.
> But even if no user ever bothers with the logic operators, the default
> behaviour "just works" in the fashion of a search engine.
>
> The main thing I don't know how to do is suffix matches (like "%foo").
> With FTS, looking up "foo*" won't match "libfoo", which is problematic.
> Any clue how to fix that?

This is handled by stemming. We'll need a custom stemmer that normalizes
libfoo to foo. Xapian has a nice page on stemming. See
https://xapian.org/docs/stemming.html

Cheers!

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]