guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File search progress: database review and question on triggers


From: Pierre Neidhardt
Subject: Re: File search progress: database review and question on triggers
Date: Mon, 24 Aug 2020 10:29:55 +0200

Hi Arun,

thanks for your feedback!

350 seconds does not seem too bad.  Let's keep in mind that this
operation will mostly be used by the build farm / substitute server, not
by the user.

> - Maybe, we shouldn't index hidden files, particularly all the .xxx-real
>   files created by our wrap phases.

I thought about this, but for now I don't think it's a good idea:

- Hidden files are rather rare and won't take up much space in the
  database.
- What if the users actually searches for a hidden file?
- It's easy to exclude hidden files from the search results.

> - You should use SQL prepared statements with sqlite-prepare,
>   sqlite-bind, etc. That would correctly handle escaping special
>   characters in the search string. Currently, searching for
>   "transmission-gtk", "libm.so", etc. errors out.

Thanks for pointing this out, I'll look into it.

> - Searching for "git perl5" works as expected, but searching for "git
>   perl" returns no results. I think this is due to the tokenizer used by
>   the full text search indexer. The tokenizer sees the word "perl5" as
>   one indivisible token and does not realize that "perl" is a prefix of
>   "perl5". Unfortunately, I think this is a fundamental problem with FTS
>   -- one that can only be fixed by using simple LIKE patterns. FTS is
>   meant for natural language search where this kind of thing would be
>   normal.

Indeed, but "git perl*" would have worked I think.  We can always
pre-process the query to add stars everywhere.

At the moment, the only downside I see with FTS is that it seems to be
impossible to match words for which we don't know the beginning.

Anyways, switching from FTS to LIKE patterns is rather easy.  In fact, I
could implement both with a tiny abstraction so that we can choose which
one we want in the end.

> - I guess you are only indexing local packages now, but will include all
>   packages later by some means.

Indeed, I want the substitute server / build farm to expose the database
for syncing.  I'd need some help to get started.  Anyone?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]