[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Inverted index to accelerate guix package search
From: |
Arun Isaac |
Subject: |
Re: Inverted index to accelerate guix package search |
Date: |
Fri, 17 Jan 2020 00:36:37 +0530 |
Pierre Neidhardt <address@hidden> writes:
> By the way, what about using Xapian in Guix?
I looked up xapian's features at https://xapian.org/features and it is
quite impressive. I was introduced to xapian through notmuch. notmuch
does not utilize xapian to the fullest and I therefore ended up
underestimating its value. Of particular importance might be the
following.
- Relevance feedback - given one or more documents, Xapian can suggest
the most relevant index terms to expand a query, suggest related
documents, categorise documents, etc.
- Phrase and proximity searching - users can search for words occurring
in an exact phrase or within a specified number of words, either in a
specified order, or in any order.
- Supports stemming of search terms (e.g. a search for "football" would
match documents which mention "footballs" or "footballer")
I think these features would really help in Pierre's work trying to
improve search and discoverability on Guix. If we are planning to have a
"Software Center" like interface at some point in the future, xapian's
search could come in handy.
Not directly related to Guix, but I also wonder if info manuals would be
a lot more useful if they had good full text search using xapian.
For the time being, since we don't have xapian bindings, I think we
should settle for sqlite's full text search capabilities.
https://www.sqlite.org/fts5.html
I have attached a short proof of concept script for an sqlite based
search. Speedup is around 200x, and populating the database only takes
around 2.5 seconds. Here is a sample run.
Sqlite database populated in 2.5516340732574463 seconds
Brute force search took 0.11850595474243164 seconds
Sqlite search took 5.459785461425781e-4 seconds
sqlite-search.scm
Description: Text document
signature.asc
Description: PGP signature
- Re: Inverted index to accelerate guix package search, (continued)
- Re: Inverted index to accelerate guix package search, Bengt Richter, 2020/01/13
- Re: Inverted index to accelerate guix package search, Pierre Neidhardt, 2020/01/14
- Re: Inverted index to accelerate guix package search, Giovanni Biscuolo, 2020/01/14
- Re: Inverted index to accelerate guix package search, zimoun, 2020/01/14
- Re: Inverted index to accelerate guix package search, Pierre Neidhardt, 2020/01/14
- Re: Inverted index to accelerate guix package search, zimoun, 2020/01/14
- Re: Inverted index to accelerate guix package search, Pierre Neidhardt, 2020/01/15
- Re: Inverted index to accelerate guix package search, zimoun, 2020/01/15
- Re: Inverted index to accelerate guix package search, Giovanni Biscuolo, 2020/01/15
- Re: Inverted index to accelerate guix package search, zimoun, 2020/01/15
- Re: Inverted index to accelerate guix package search,
Arun Isaac <=
- Re: Inverted index to accelerate guix package search, zimoun, 2020/01/16
- Re: Inverted index to accelerate guix package search, Pierre Neidhardt, 2020/01/17
- Re: Inverted index to accelerate guix package search, Pierre Neidhardt, 2020/01/17
- Re: Inverted index to accelerate guix package search, Arun Isaac, 2020/01/18
- Re: Inverted index to accelerate guix package search, Arun Isaac, 2020/01/15
- Re: Inverted index to accelerate guix package search, Pierre Neidhardt, 2020/01/15
- Re: Inverted index to accelerate guix package search, zimoun, 2020/01/15
- Re: Inverted index to accelerate guix package search, Pierre Neidhardt, 2020/01/15
- Re: Inverted index to accelerate guix package search, zimoun, 2020/01/15
- Re: Inverted index to accelerate guix package search, Ricardo Wurmus, 2020/01/15