[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Inverted index to accelerate guix package search
From: |
zimoun |
Subject: |
Re: Inverted index to accelerate guix package search |
Date: |
Mon, 13 Jan 2020 18:54:18 +0100 |
Hi Arun,
For sure, it is a good direction... but it is not that simple. :-)
On Mon, 13 Jan 2020 at 16:08, Arun Isaac <address@hidden> wrote:
> > Those measures don't seem precise enough to draw a good conclusion.
> > Could you increase the sample size (or maybe just loop?) so that all
> > times reach over a second or so?
>
> Indeed, my bad! Here are better timing results. I have repeated each of
> the searches a 1000 times, and I'm getting around 90x speedup! :-)
>
> Building index
> clock utime stime cutime cstime gctime
> 9.76 12.50 0.18 0.00 0.00 6.16
>
> Brute force search (repeated 1000 times)
> clock utime stime cutime cstime gctime
> 195.95 264.13 0.90 0.00 0.00 142.62
`fold-packages` traverses all the packages so yes it is "slow".
> Inverted index search (repeated 1000 times)
> clock utime stime cutime cstime gctime
> 2.18 2.48 0.00 0.00 0.00 0.60
Yes, looking up into an hash table is really really faster. :-)
However, the core issue about "search" is not 'find quickly an item'
but 'find the relevant item'.
For example, Bloom filter [1] or B-tree could be interesting data
structure if we are talking about fast retrieval. Note that this part
should even be delegated to SQLite, IMHO.
[1] https://en.wikipedia.org/wiki/Bloom_filter
[2] https://en.wikipedia.org/wiki/B-tree
Well, the issue is 'scoring' a query. For example, try:
guix search youtube | recsel -p name,relevance
--8<---------------cut here---------------start------------->8---
name: youtube-dl-gui
relevance: 15
name: youtube-viewer
relevance: 11
name: youtube-dl
relevance: 11
name: mps-youtube
relevance: 11
name: emacs-youtube-dl
relevance: 11
--8<---------------cut here---------------end--------------->8---
Why the package youtube-dl-gui is more relevant? Because the function
that computes the relevance score not enough "smart". The function
guix/ui.scm(relevance) is too simple: the score is the number of
ocurences of the pattern (weighted by the field where the pattern
matches). Therefore, if you compare "guix show" the packages
"youtube-dl-gui" and "youtube-dl", you will see that the term
"youtube" (case-insentive) appears 5 times for youtube-dl-gui against
only 3 times for youtube-dl. Does the difference (5 vs 3) provide more
information? I do not think so... The issue is: 1. the description is
not well-written (for the current scoring system) and 2. the scoring
function is too simple.
I have tried to described such issues/views here [3] [4] and since I
have not read so much about recommender systems: state-of-the-art, NLP
tools in Guile, etc.
[3] https://lists.gnu.org/archive/html/guix-devel/2019-07/msg00252.html
[4] https://lists.gnu.org/archive/html/guix-devel/2019-12/msg00160.html
Last, I have not read the 2 links you pointed but I think you have a
bug in your code. The retrieval "game" using the index returns the
package "r-mgcv" which does not contain the term "game" in its
description. Well, the query "game" using the brute force does not
return this very package.
Then, in the function
"guix/scripts/package.scm(find-packages-by-description)", you could
try to replace the `fold-packages' by something using the inverted
index. (Note that the name `find-packages-by-descripton' is odd
because it uses synopsis, name, etc. too.).
Hope that helps.
Chhers,
simon
- Inverted index to accelerate guix package search, Arun Isaac, 2020/01/12
- Re: Inverted index to accelerate guix package search, Pierre Neidhardt, 2020/01/13
- Re: Inverted index to accelerate guix package search, Arun Isaac, 2020/01/13
- Re: Inverted index to accelerate guix package search,
zimoun <=
- Re: Inverted index to accelerate guix package search, Bengt Richter, 2020/01/13
- Re: Inverted index to accelerate guix package search, Pierre Neidhardt, 2020/01/14
- Re: Inverted index to accelerate guix package search, Giovanni Biscuolo, 2020/01/14
- Re: Inverted index to accelerate guix package search, zimoun, 2020/01/14
- Re: Inverted index to accelerate guix package search, Pierre Neidhardt, 2020/01/14
- Re: Inverted index to accelerate guix package search, zimoun, 2020/01/14
- Re: Inverted index to accelerate guix package search, Pierre Neidhardt, 2020/01/15
- Re: Inverted index to accelerate guix package search, zimoun, 2020/01/15
- Re: Inverted index to accelerate guix package search, Giovanni Biscuolo, 2020/01/15
- Re: Inverted index to accelerate guix package search, zimoun, 2020/01/15