[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inverted index to accelerate guix package search

From: Arun Isaac
Subject: Re: Inverted index to accelerate guix package search
Date: Sat, 18 Jan 2020 00:59:05 +0530

> What is not clear to me right now in both implementations are.
> 1.
> How to update the index.
> Give a look at the "pull" code and the ~/.cache/guix folder.

We don't "update" the index. At every guix pull we create it
anew. Currently, generate-package-cache in gnu/packages.scm does
this. generate-package-cache is called by package-cache-file in
guix/channels.scm. package-cache-file is a channel profile hook listed
under %channel-profile-hooks.

Now, what I am unclear about is how to test my sqlite index building
code without actually pushing to master and running a guix pull. I will
go through the various tests in Guix and see if I can figure something
out, but any pointers would be much appreciated.

> 2.
> How to deal with regexp.
> It is more or less clear to me how to deal with using the trigram keys
> but I do not know with SQLite; I have not thought about yet.

I think it is not possible to search using regular expressions in sqlite
unless some external module is loaded. See

I think we should remove regex support altogether. I don't think a good
search interface should expect the user to provide regexes for
search. Certainly, it will be a lot less useful if and when we have
xapian. However, just to keep backward compatibility, we can fall back
to brute force fold-packages search for regexes. As Ludo pointed out, we
can't remove the brute force code since we need to support cases when
the cache is not authoritative.

> If you want to implement it, go ahead. :-)

Yes, I'll give it a shot. :-) I have some other commitments over the
weekend, but hopefully I'll have something by Monday night.

> Otherwise, I will try to finish next week what I started yesterday
> evening using VHash. :-)

About sqlite versus an inverted index using vhashes, I don't know if it
is possible to serialize a vhash onto disk. Even if that were possible,
we'll have to load the entire vhash based inverted index into memory for
every invocation of guix search, and that could hit
performance. Something like guile-gdbm could have helped, but that's
another story.

Also, I now agree with your earlier assessment that we should delegate
all this to sqlite. :-) That guix already uses sqlite for other things
is all the more reason.

> (note that to avoid duplicate , the file sets.scm can be relevant)

I didn't know about sets.scm when I wrote my first proof of concept
inverted index script. That is why I reinvented the set using hash
tables. I don't know how hash tables are different from VHashes or which
is better.

Cheers! :-)

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]