emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings


From: Mattias Engdegård
Subject: Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings
Date: Sat, 8 Oct 2022 18:49:11 +0200

7 okt. 2022 kl. 21.25 skrev Eli Zaretskii <eliz@gnu.org>:
> 
>> +      /* Two arbitrary multibyte strings: we cannot use memcmp because
>> +     the encoding for raw bytes would sort those between U+007F and U+0080
>> +     which isn't where we want them.
>> +     Instead, we skip the longest common prefix and look at
>> +     what follows.  */
> 
> I don't think I understand this; please elaborate.  Didn't you say
> that we never need to look beyond the first unequal byte?  Then why
> does the order of raw bytes matter here?

The comment explains why memcmp cannot be used to compare arbitrary multibyte 
strings and it's exactly as it says: a bytewise comparison would not produce 
the same order as string-lessp has used in the past because of how we encode 
raw bytes, that's all.

> Are you sure about the alignment?

Actually I had asked someone about that before and received the answer that 
string data alignment was guaranteed, and a semi-thorough reading of the code 
seemed to confirm this -- normal allocation ensures alignment via struct sdata 
(q.v.) and while AUTO_STRING does not, it only makes unibyte strings which do 
not concern us in the code path in question.

Of course I was wrong! String data from purespace can be unaligned even for 
multibyte. Thanks for making me take another look. (Of course, angry SPARC 
users would have let me know eventually.)

Rather than attempting to find and plug all cases where unaligned strings are 
produced, this part of the optimisation has now been restricted to platforms 
where unaligned accesses are safe using a architecture whitelist.

It may still be a good idea to ensure aligned allocation since it allows for 
more vectorisation of string operations but then again, most commonly used 
architectures handle unaligned accesses well.

> why no tests for this?

`string-lessp` has much better test coverage than what is typical for Emacs 
primitives (it had basically none until I added some a while ago) but there are 
now a few supplementary cases.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]