bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#48841: fido-mode is slower than ido-mode with similar settings


From: João Távora
Subject: bug#48841: fido-mode is slower than ido-mode with similar settings
Date: Fri, 11 Jun 2021 18:09:04 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

Dmitry Gutov <dgutov@yandex.ru> writes:

> On 07.06.2021 11:52, João Távora wrote:
>
>>     Maybe moving all of them to parameters and return values (making it a
>>     static function and having the caller manage state) would help, I
>>     haven't tried that exactly.
>> Normally, in those adventures you end up with the same allocations
>> somewhere else, and uglier code. But you can try.
>
> I have it a try, with little success (patch attached, for
> posterity). Since there's no multiple value returns in Lisp, I had to
> define a container for the three values.

And if there were multiple value, you can bet the container for them
wouldn't be free ;-)

> The performance is basically the same, which seems to indicate that
> either Elisp has to allocate very little for a compiled lambda code,
> or it's optimized out (which would also make sense: the only thing
> necessary for it is a new container for the current scope).

Which lambda are we talking about?  Is it ` update-score-and-face`?  If
so, I would guess that the capture of `score-denominator` is what takes
space, and that space is no bigger than another variable in that let
scope.

>> Though given C/C++, a known processor and the right application,
>> this will make a world of a difference, and will yield truly "weird"
>> results (which arent weird at all after you understand the
>> logic). Like, for example a vector being much better at sorted
>> insertion than a linked list. (!) Look it up. Bjarne Stroustrup has
>> one of those talks.
> When you have to do some work, better memory locality can indeed
> change a lot. But in this case we have an already computed value
> vs. something the code still needs to compute, however fast that is.

But `length` of a string, in any sane string implementation, _is_
accessing "an already computed value".  Which likely lives just besides
the data.  In Emacs, it seems to be two pointers (8 bytes) apart from
the data.  In a system with 64bytes of L1/2/3 cache it still
theoretically makes up to 52 bytes come in "for free" after you read the
length.  But to be honest I tried a bit and haven't come up with
benchmarks to help confirm -- or help dispel -- this theory.  Maybe you
can distill your "weird" experiment down to a code snippet?

> Accessing function arguments must be currently much faster than
> looking up the current scope defined with 'let'.

In a compiled CL system, I would expect the former to use the stack, and
the to use the heap, but it wouldn't make any difference in reading the
variable's value, I think.  But Elisp is byte-compiled, not natively
compiled (except for that thing now, haven't tried it), and i don't
understand how the byte-compiler chooses byte-codes so all bets are off.

> Anyway, looking at what else could be removed, now that the extra
> allocation in 'match-data' is gone, what really speeds it up 2x-11x
> (depending on whether GC kicks in, but it more often doesn't), is
> commenting out the line:
>
>   (setq str (copy-sequence str))
>
> So if it were possible to rearrange completion-pcm--hilit-commonality
> not to have to modify the strings (probably removing the function
> altogether?), that would improve the potential performance of c-a-p-f
> quite a bit, for fido-mode and other frontends (depending on how much
> overhead the other layers add).

Very interesting.  I don't know what the matter is with modifying the
string itself.  Is it because we want to protect its 'face' property?
Maybe, but what's the harm in chaning it?  Maybe Stefan knows.  Stefan,
are you reading this far?

If we do want to protect the shared 'face' property -- and only 'face'
-- then we could very add some other property about face that the
frontend could read "just in time" before it itself makes a copy of the
string to display to the user.  

This technique appears to be slightly simpler than using the hash-table
indirection you propose (we would need something like that if, for some
reason, we absolutely could not touch the string's property list.)

> Anyway, these are musing for the much-discussed future iteration of
> the API. With the current version, and tied by backward compatibility,

Maybe I'm missing something, but I don't see why my above idea requires
changing _that_ much of the API (a bit would change yes).  It's a matter
of letting frontends opt-out of the current readily-available
face-propertized completions and opt-into a display-time facility that
does this propertization. 

But if the speedup is big, I'd revisit the rationale for requiring those
copies to be performed in the first place.  In my (very brief) testing
it doesn't hurt a bit to remove it.

> Looking forward for your analysis of fido-vertical-mode's performance
> improvement over the "normal" one.

Will take a look now.

João





reply via email to

[Prev in Thread] Current Thread [Next in Thread]