|
From: | grischka |
Subject: | Re: Case mapping of sharp s |
Date: | Fri, 20 Nov 2009 00:25:34 +0100 |
User-agent: | Thunderbird 2.0.0.23 (Windows/20090812) |
Stefan Monnier wrote:
Actually I think there is something simply wrong with the simple search, as it's much slower even for single chars (where bm doesn't have any advantage) and additionally in some weird random fashion it's again slower for backwards search, such as 14, 37, 66 ... 94 secs, where the bm takes 0.5 secs and simple forward constantly ~3.7 secs, all for isearch'ing one character in a 100Mb file.I can guess why it's much slower going backward: the simple search operates on chars rather than bytes. The internal encoding we use (currently based on utf-8) is designed to be easy to parse going forward but not so easy going backward (IIRC our encoding is actually even a bit more painful in this case than pure utf-8). BM on the other hand works on bytes, so there's no such slowdown.
Okay, it's utf-8 (aka multibyte I suppose) however I was using the same buffer in both cases (just a different search string). So whatever makes it more difficult to scan backwards the same situation exists for bm too. Also how can it happen that a C function varies between 4 and 90 seconds for the same action.
As for the general slowdown, it may also be due to having to parse the char (encoded in utf-8) and then look it up in the corresponding char table (a tree of arrays).
From what I saw the simple_search() calls out to lisp for every single buffer-byte whereas boyer_moore() just translates it with a prefabricated C table. However if simple_search just would prepare the first char of the pattern in both lower and uppercase version readily for comparison it could be as fast as bm I guess (for a pattern length of less than 4 that is) --- grischka
But maybe we're doing something silly somewhere. Stefan
[Prev in Thread] | Current Thread | [Next in Thread] |