[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inverted index to accelerate guix package search

From: zimoun
Subject: Re: Inverted index to accelerate guix package search
Date: Thu, 16 Jan 2020 20:53:27 +0100

Hi Arun,

On Thu, 16 Jan 2020 at 20:11, Arun Isaac <address@hidden> wrote:
> zimoun <address@hidden> writes:
> > About (1), let implement something experimental and time it to compare
> > apples with apples. :-)
> > I mean I am working on it.
> I am interested in working on an experimental sqlite based
> implementation too. So, do let me know your plans so we don't duplicate
> too much work. :-)

Why not directly go for SQLite. But some details are not clear to me
and they are clearer with a Guile hash table or VHash.

What is not clear to me right now in both implementations are.

How to update the index.
Give a look at the "pull" code and the ~/.cache/guix folder.

How to deal with regexp.
It is more or less clear to me how to deal with using the trigram keys
but I do not know with SQLite; I have not thought about yet.

Basically, the current search works like that: the CLI arguments are
transformed in 'patterns', then transformed in 'regexps' which is
basically a list of terms; then 'find-packages-by-description' is

--8<---------------cut here---------------start------------->8---
(let* ((patterns (filter-map (match-lambda
                              (('query 'search rx) rx)
                              (_                   #f))
       (regexps  (map (cut make-regexp* <> regexp/icase) patterns))
       (matches  (find-packages-by-description regexps)))
--8<---------------cut here---------------end--------------->8---

The 'find-packages-by-description' applies a 'fold-packages' where
each term of the regexps list is lookup in each package (name,
synopsis, description, location, etc.) computing the relevance with

--8<---------------cut here---------------start------------->8---
(let ((matches (fold-packages (lambda (package result)
                                (if (package-superseded package)
                                  (match (package-relevance package
                                         ((? zero?)
                                          (cons (cons package score)
--8<---------------cut here---------------end--------------->8---

Therefore, this call to 'fold-packages' needs to be replaced.

Using the trigram keys, I see more or less how to output the list of
packages where the key matches one term from the list of regexps. Some
False Positive will be filtered out then by the 'package-relevance'

Using the SQLite, I do not know now. I need to read a bit about SQL query. :-)

If you want to implement it, go ahead. :-)
Otherwise, I will try to finish next week what I started yesterday
evening using VHash. :-)
(note that to avoid duplicate , the file sets.scm can be relevant)

All the best,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]