[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case mapping of sharp s

From: David Kastrup
Subject: Re: Case mapping of sharp s
Date: Thu, 19 Nov 2009 23:43:19 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux)

Stefan Monnier <address@hidden> writes:

>> Actually I think there is something simply wrong with the simple
>> search, as it's much slower even for single chars (where bm doesn't
>> have any advantage) and additionally in some weird random fashion
>> it's again slower for backwards search, such as 14, 37, 66 ... 94
>> secs, where the bm takes 0.5 secs and simple forward constantly
>> ~3.7 secs, all for isearch'ing one character in a 100Mb file.
> I can guess why it's much slower going backward: the simple search
> operates on chars rather than bytes.  The internal encoding we use
> (currently based on utf-8) is designed to be easy to parse going forward
> but not so easy going backward (IIRC our encoding is actually even a bit
> more painful in this case than pure utf-8).

I don't think so.  The utf-8 _scheme_ can be used to encode 21bits in 4
characters.  We stay within that range, in the utf-8 4 character scheme,
but outside of the Unicode range 2^20+2^16.

> BM on the other hand works on bytes, so there's no such slowdown.

With utf-8, I think that apart from character ranges, search forward and
backward should work perfectly like on 8-bit characters.  Exception is
incomplete character matches, but since the utf-8 scheme can immediately
tell "is a 7-bit character" "is the first character of a multibyte
sequence of length n" "is last or intermediate character of multibyte
sequence" this is not a serious problem.

> But maybe we're doing something silly somewhere.

The Emacs 22 multibyte scheme likely had worse properties for reverse
searching.  So maybe something might be simplified nowadays.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]