emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case mapping of sharp s


From: David Kastrup
Subject: Re: Case mapping of sharp s
Date: Thu, 19 Nov 2009 23:43:19 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux)

Stefan Monnier <address@hidden> writes:

>> Actually I think there is something simply wrong with the simple
>> search, as it's much slower even for single chars (where bm doesn't
>> have any advantage) and additionally in some weird random fashion
>> it's again slower for backwards search, such as 14, 37, 66 ... 94
>> secs, where the bm takes 0.5 secs and simple forward constantly
>> ~3.7 secs, all for isearch'ing one character in a 100Mb file.
>
> I can guess why it's much slower going backward: the simple search
> operates on chars rather than bytes.  The internal encoding we use
> (currently based on utf-8) is designed to be easy to parse going forward
> but not so easy going backward (IIRC our encoding is actually even a bit
> more painful in this case than pure utf-8).

I don't think so.  The utf-8 _scheme_ can be used to encode 21bits in 4
characters.  We stay within that range, in the utf-8 4 character scheme,
but outside of the Unicode range 2^20+2^16.

> BM on the other hand works on bytes, so there's no such slowdown.

With utf-8, I think that apart from character ranges, search forward and
backward should work perfectly like on 8-bit characters.  Exception is
incomplete character matches, but since the utf-8 scheme can immediately
tell "is a 7-bit character" "is the first character of a multibyte
sequence of length n" "is last or intermediate character of multibyte
sequence" this is not a serious problem.

> But maybe we're doing something silly somewhere.

The Emacs 22 multibyte scheme likely had worse properties for reverse
searching.  So maybe something might be simplified nowadays.

-- 
David Kastrup





reply via email to

[Prev in Thread] Current Thread [Next in Thread]