[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#39258] [PATCH v2 0/3] Xapian for Guix package search
From: |
Ludovic Courtès |
Subject: |
[bug#39258] [PATCH v2 0/3] Xapian for Guix package search |
Date: |
Sun, 08 Mar 2020 12:33:44 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) |
Hi,
Arun Isaac <address@hidden> skribis:
>>> It turns out that most of the time is spent in printing and texinfo
>>> rendering of the search results.
>
> Also, when we put all package metadata into the Xapian index, we don't
> have to look up any of the package variables in (gnu packages *) during
> `guix search` time. This also contributes substantially to the speedup.
Yup.
>> In general, pre-rendering doesn’t seem practical to me: the output of
>> ‘guix search’ is locale-dependent (it speaks the user’s language) and
>
> Note that we already need to index package synopses and descriptions in
> all languages. I still haven't implemented this, though.
Oh, right. Tricky!
>> adjusts to the terminal width (well, this is temporarily broken on
>> Guile 3.0.0, but see ‘%text-width’ in (guix ui)).
>
> This could be accomplished even with pre-rendering. Xapian provides
> "slots" to store arbitrary strings with a document. Instead of storing
> the pre-rendered document as a whole, we could store pre-rendered fields
> in separate slots. Then, during `guix search` time, we can assemble the
> result from these pre-rendered fields.
I’m not sure I understand. The index wouldn’t store pre-rendered
strings for every possible terminal width, right?
>> Also, if the 12K+ descriptions need to be rendered at the time the user
>> runs ‘guix pull’, the experience may not be great, because it could take
>> a bit of time.
>
> This is a problem, but I would see it as a necessary "compilation"
> step. :-P In fact, this whole patchset speeds up `guix search` by doing
> part of the work of `guix search` ahead of time. So, some such cost is
> unavoidable.
Yeah. I think we need to take the whole user experience into account,
not just ‘guix search’. ‘guix pull’ already feels very slow, and it’s a
fairly common operation. Conversely, ‘guix search’ takes roughly
between 0.5 and 2 seconds and is an uncommon operation on a “slow path”
(in the sense that when you’re searching for software, you’ll probably
have to spend more than a couple of seconds to find what you’re looking
for.)
>> What I like about the recutils format in this context is that it’s both
>> human- and machine-readable. The examples in the manual show how it can
>> be useful to select the information displayed or to refine the search
>> (info "(guix) Invoking guix package").
>
> Xapian's query language is much more natural (as in natural language)
> than the regexp based techniques we need to use with recutils. I have
> hardly ever used the regexp based search and I suspect many others
> haven't either. Also, refining the search query should be easier to do
> with Xapian. We could even use Xapian's query expansion feature to
> suggest improved queries to the user.
I’m not sufficiently familiar with Xapian’s query language. The
examples I had in mind were:
guix search malloc | recsel -p name,version,relevance
guix search | recsel -p name -e 'license ~ "LGPL 3"'
guix search crypto library | \
recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis
It’s not so much about regexps than it is about selecting individual
fields.
>> Were you able to measure the cost of rendering specifically?
>
> generate-package-search-index takes around 50 seconds. If I modify
> generate-package-search-index to not pre-render but simply store the
> package description alone, it takes around 20 seconds. That gives us a
> rough idea of the cost of pre-rendering.
To me, adding 20–50 seconds on ‘guix pull’ would be undesirable. :-/
>> I think we should look at a profile of ‘package->recutils’, there’s
>> probably room for improvement there.
>
> On quick inspection, most of the time in package->recutils is spent in
> texinfo rendering the description. Unless we use the simplified search
> results format as discussed above, we cannot avoid it.
What I meant was that we could use (statprof) to see whether/how Texinfo
rendering/parsing can be optimized.
Thanks,
Ludo’.
- [bug#39258] [PATCH v2 0/3] Xapian for Guix package search, Arun Isaac, 2020/03/07
- [bug#39258] [PATCH v2 1/3] build-self: Add guile-xapian to Guix dependencies., Arun Isaac, 2020/03/07
- [bug#39258] [PATCH v2 2/3] gnu: Generate Xapian package search index., Arun Isaac, 2020/03/07
- [bug#39258] [PATCH v2 3/3] gnu: Use Xapian index for package search., Arun Isaac, 2020/03/07
- [bug#39258] [PATCH v2 0/3] Xapian for Guix package search, Ludovic Courtès, 2020/03/07
- [bug#39258] [PATCH v2 0/3] Xapian for Guix package search, Arun Isaac, 2020/03/08
- [bug#39258] [PATCH v2 0/3] Xapian for Guix package search, Ludovic Courtès, 2020/03/09
- [bug#39258] [PATCH v2 0/3] Xapian for Guix package search, Arun Isaac, 2020/03/10
- [bug#39258] [PATCH v2 0/3] Xapian for Guix package search, zimoun, 2020/03/10
- [bug#39258] [PATCH v2 0/3] Xapian for Guix package search, Ludovic Courtès, 2020/03/11
- [bug#39258] [PATCH v2 0/3] Xapian for Guix package search, Arun Isaac, 2020/03/13
- [bug#39258] [PATCH v2 0/3] Xapian for Guix package search, Ludovic Courtès, 2020/03/15
- [bug#39258] [PATCH v2 0/3] Xapian for Guix package search, Pierre Neidhardt, 2020/03/09