[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case mapping of sharp s

From: grischka
Subject: Re: Case mapping of sharp s
Date: Fri, 20 Nov 2009 00:25:34 +0100
User-agent: Thunderbird (Windows/20090812)

Stefan Monnier wrote:
Actually I think there is something simply wrong with the simple
search, as it's much slower even for single chars (where bm doesn't
have any advantage) and additionally in some weird random fashion
it's again slower for backwards search, such as 14, 37, 66 ... 94
secs, where the bm takes 0.5 secs and simple forward constantly
~3.7 secs, all for isearch'ing one character in a 100Mb file.

I can guess why it's much slower going backward: the simple search
operates on chars rather than bytes.  The internal encoding we use
(currently based on utf-8) is designed to be easy to parse going forward
but not so easy going backward (IIRC our encoding is actually even a bit
more painful in this case than pure utf-8).  BM on the other hand works
on bytes, so there's no such slowdown.

Okay, it's utf-8 (aka multibyte I suppose) however I was using
the same buffer in both cases (just a different search string).
So whatever makes it more difficult to scan backwards the same
situation exists for bm too.  Also how can it happen that a C
function varies between 4 and 90 seconds for the same action.

As for the general slowdown, it may also be due to having to parse the
char (encoded in utf-8) and then look it up in the corresponding char table
(a tree of arrays).

From what I saw the simple_search() calls out to lisp for every
single buffer-byte whereas boyer_moore() just translates it
with a prefabricated C table.

However if simple_search just would prepare the first char of
the pattern in both lower and uppercase version readily for
comparison it could be as fast as bm I guess (for a pattern
length of less than 4 that is)

--- grischka

But maybe we're doing something silly somewhere.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]