emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings


From: Eli Zaretskii
Subject: Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings
Date: Sat, 08 Oct 2022 21:25:29 +0300

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Sat, 8 Oct 2022 18:49:11 +0200
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> 7 okt. 2022 kl. 21.25 skrev Eli Zaretskii <eliz@gnu.org>:
> > 
> >> +      /* Two arbitrary multibyte strings: we cannot use memcmp because
> >> +   the encoding for raw bytes would sort those between U+007F and U+0080
> >> +   which isn't where we want them.
> >> +   Instead, we skip the longest common prefix and look at
> >> +   what follows.  */
> > 
> > I don't think I understand this; please elaborate.  Didn't you say
> > that we never need to look beyond the first unequal byte?  Then why
> > does the order of raw bytes matter here?
> 
> The comment explains why memcmp cannot be used to compare arbitrary multibyte 
> strings and it's exactly as it says: a bytewise comparison would not produce 
> the same order as string-lessp has used in the past because of how we encode 
> raw bytes, that's all.

As long as memcmp reports equality, we don't care, and once it reports
inequality, you can examine the first unequal bytes "by hand".  Right?
So I still don't understand the comment and how it led you to the
conclusion.

I also asked about memmem -- did you consider using that?

> > Are you sure about the alignment?
> 
> Actually I had asked someone about that before and received the answer that 
> string data alignment was guaranteed, and a semi-thorough reading of the code 
> seemed to confirm this -- normal allocation ensures alignment via struct 
> sdata (q.v.) and while AUTO_STRING does not, it only makes unibyte strings 
> which do not concern us in the code path in question.

AFAIU, AUTO_STRING can also generate stack-allocated multibyte strings.

> > why no tests for this?
> 
> `string-lessp` has much better test coverage than what is typical for Emacs 
> primitives

For non-ASCII strings?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]