bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time


From: Eli Zaretskii
Subject: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time
Date: Mon, 27 Sep 2021 14:48:06 +0300

> Date: Mon, 27 Sep 2021 09:23:28 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: 50733@debbugs.gnu.org, dgutov@yandex.ru, mardani29@yahoo.es
> 
> > To get back to the issue at hand: we are talking (or at least I was 
> > talking) about scalability of an algorithm, not about some particular 
> > implementation of the algorithm.
> 
> Are you now again shifting the discussion to something else, a theoretical 
> comparison between various algorithms?

That was the original intent, so there's no change in subject on my
side.  Apologies if that wasn't clear enough.

> > Ripgrep is a multithreaded program, whereas idutils is single-threaded. 
> > So for a fair comparison of scalability of these two main ideas: 
> > file-based search vs DB search, you need at the very least to limit 
> > ripgrep to a single thread.  And then you need to run each program on 
> > code bases of various sizes, preferably those which differ by orders of 
> > magnitude or close to that, and see their O(n) behavior.  And exclude 
> > from your comparison command-line options that require IDUtils to access 
> > the files in addition to the DB.  That would be at least an 
> > approximation to comparing apples to apples.
> 
> You're asking me to disable everything that makes ripgrep a modern tool, 
> and to disable everything that makes idutils an outdated tool, to make the 
> outdated tool shine in comparison?  Interesting viewpoint.

That's the _only_ valid viewpoint when talking about scalability of
algorithms.  Aspects not relevant to the algorithm being examined
should not affect the comparison.  Comparing a multithreaded
implementation with a single-threaded one is like comparing programs
running on two very different computers, from the performance POV.

> > IDUtils is an example of the latter, and it beats many utilities that 
> > search the files, including ripgrep, as long as it doesn't need to 
> > access the files themselves.  But even if it doesn't always beat them 
> > (which you didn't yet demonstrate), it just means the ideas of its 
> > design should be taken further and/or implemented better, that's all.
> 
> I provided you with many numbers and comparisons, which IMO demonstrate 
> what they were meant to demonstrate.  A tool which finds matches for a 
> regexp in a O(100 MB) code base in O(10 ms), and in a O(1 GB) code base in 
> O(100 ms), is clearly good enough in practice.

"Good enough in practice" is not the issue at hand.  No one in their
right mind will argue that ripgrep's performance is not good enough in
practice, for the tree sizes that we are talking about.

But the scalability you measured is approximately O(n), as expected
from a tool that accesses the files while searching.  IDUtils' gid
basically runs in O(1) time.  I've just run it on a tree with 17MB of
code in 400 files (ID database size 54KB) and on a tree with 1.4GB of
code in 40,000 files (DB size 13.5MB), and I get the same time for the
same search: 15ms.  No algorithm that actually accesses most or all of
the files in the tree can scale like that.

And if the above still doesn't convince you, then please forgive me,
but I'll bow out of this.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]