guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#39258] [PATCH 2/4] ui: Use string matching with literal search stri


From: zimoun
Subject: [bug#39258] [PATCH 2/4] ui: Use string matching with literal search strings.
Date: Sun, 14 Jun 2020 21:14:34 +0200

Dear Arun,

Here, I am speaking about only the first patch: the cut-off.

TL;DR:
 1. I was wrong about the bottleneck.
 2. The queries were not the good ones to see a clear effect
    -- on my machine.
    

On Sat, 13 Jun 2020 at 22:51, Arun Isaac <arunisaac@systemreboot.net> wrote:

> Yes, I did read your earlier mail. And, I tried again, this time with
> patch 1 alone. It certainly makes a difference on my machine. It is
> clear from the code logic that it should make a difference on your
> machine as well, at least for longer queries. But, somehow it isn't and
> I do not understand why. :-(

Well, I spent some hours* to do some stats (Student's t-test).  Roughly
speaking, on my machine, the standard deviation error (stddev) hides the
point -- depending on the query -- and that's why I am not always seeing
the improvement, I guess.

*ah all my Sunday in fact. ;-)


I compared different conditions for the query "game strategy":

 - cold    vs warm
 - xterm   vs shell in Emacs (my config vs -q)
 - no pipe vs pipe

And I run 10 times in a row each experiment.  The conclusion is: in
average -- on my machine -- the cut-off improves.  But sometimes
considering only 3 repeats in a row, the improvement is not obvious (on
the mean); because the both tails of distribution overlap a bit on my
machine and so it is kind of bad luck.  And it is ``worse'' depending
against which commit your patch is rebased: a357849 (old) vs e782756.

The t-test captures this variation, even with only 3 repeats, but I have
not done in my previous email and only compared the visible mean.  Sorry
about that.

Moreover, printing increases the stddev, so the results are more
fluctuating inside Emacs vs xterm and piping helps in this case.

Piping does not change the final result -- hopefully. :-)  It adds an
extra time but in average it is the same.

About cold vs warm cache, I notice that the improvement is not the same
(in average).  Considering the raw time, there is a difference about 10%
(with "good" confidence); it could be worth to understand why.


Well, considering that, I did other stats with other queries and the
conclusion for my machine is that *the patch improves* on average by
reducing the timing for typical usages.  Which is really cool! :-)


I definitively have wrong about the bottleneck and this one could be
one.  One way to have an idea is to use "statprof" but it is hard for me
to read the results (I believe Guile master have a fix improving the
'anon #addr', but do not really know more).

--8<---------------cut here---------------start------------->8---
$ /tmp/v5-1/bin/guix repl
scheme@(guix-user)> ,use(guix scripts search)
scheme@(guix-user)> ,pr (guix-search "game" "strategy")
%     cumulative   self             
time   seconds     seconds  procedure
 17.81      0.29      0.27  anon #xe40178
 12.33      0.20      0.18  ice-9/boot-9.scm:2201:0:%load-announce
 12.33      0.18      0.18  anon #xe3c770
  5.48      0.08      0.08  ice-9/boot-9.scm:1396:0:symbol-append
  4.11      1.57      0.06  guix/memoization.scm:100:0
  4.11      0.06      0.06  ice-9/popen.scm:145:0:reap-pipes
  2.74      0.55      0.04  guix/ui.scm:1511:12
  2.74      0.33      0.04  ice-9/regex.scm:170:0:fold-matches
  2.74      0.04      0.04  
ice-9/boot-9.scm:3540:0:autoload-done-or-in-progress?
  2.74      0.04      0.04  texinfo/string-utils.scm:98:5
  2.74      0.04      0.04  ice-9/vlist.scm:539:0:vhash-assq
  1.37     69.81      0.02  ice-9/threads.scm:388:4
[...]
---
Sample count: 73
Total time: 1.490955132 seconds (0.387756476 seconds in GC)
--8<---------------cut here---------------end--------------->8---

To compare with the default:

--8<---------------cut here---------------start------------->8---
time   seconds     seconds  procedure
 24.47      0.49      0.46  anon #x1d89178
 21.28      0.40      0.40  anon #x1d85770
  9.57      0.20      0.18  ice-9/boot-9.scm:2201:0:%load-announce
  3.19      4.71      0.06  ice-9/boot-9.scm:1673:4:with-exception-handler
  3.19      1.64      0.06  guix/memoization.scm:100:0
  3.19      0.06      0.06  
ice-9/boot-9.scm:3540:0:autoload-done-or-in-progress?
  3.19      0.06      0.06  anon #x1d84c78
  3.19      0.06      0.06  ice-9/popen.scm:145:0:reap-pipes
  2.13      1.01      0.04  guix/ui.scm:1511:12
  2.13      0.08      0.04  ice-9/boot-9.scm:1396:0:symbol-append
  2.13      0.04      0.04  anon #x1d83248
  1.06      0.30      0.02  anon #x7f057e6c90e8
[...]
--8<---------------cut here---------------end--------------->8---

So clearly the patch has an effect!  If someone knows what is:

 - ice-9/boot-9.scm:2201:0:%load-announce
 - ice-9/boot-9.scm:1396:0:symbol-append
 
and from where they could come from, it could help. :-)

Well, I am interested to know which part is the Regex Engine and the
string search. :-) Linking to the discussion about KMP and others.


> Here are more fresh results. Could you try for longer queries like
> "strategy game caesar" and without the output being piped to recsel,
> grep, etc.? For simplicity, let's talk only about warm cache results.
>
> |----------------------------------+--------+-------|
> | query                            | before | after |
> |----------------------------------+--------+-------|
> | guix search strategy game        |   2.58 |  1.96 |
> | guix search strategy game caesar |   2.95 |  1.76 |
> |----------------------------------+--------+-------|

At first, I was confused why one more terms returns faster.  This is
because the query "caesar" returns only one package so the query
"strategy game caesar" cuts off all the packages when searching the
terms "game" and then "strategy".  I mean

   guix search julius

should be as long as

   guix search strategy game caesar

It is; in average on my machine.

And secondly, I was confused because the timing of the query "caesar
strategy game" is almost the same (2.8% +/- 2.5% with 99.0% of
confidence; 10 repeats).  Well, it is because in one case the term
"caesar" is applied to 15 packages and in another case the terms
"strategy" and "game" are applied to 1 package.  Adding some stddev
error and not enough repeats (nor good stats), the confusion is complete
and my conclusion is wrong.


That's said, the effect of the cut-off is clear (on my machine even with
on shot) with the queries:

  - game strategy the
  - the game strategy


Thank you,
simon






reply via email to

[Prev in Thread] Current Thread [Next in Thread]