[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case mapping of sharp s

From: grischka
Subject: Re: Case mapping of sharp s
Date: Thu, 26 Nov 2009 14:07:51 +0100
User-agent: Thunderbird (Windows/20090812)

Kenichi Handa wrote:
In article <address@hidden>, grischka <address@hidden> writes:

DEC_BOTH is maybe not slower than INC_BOTH, but two DEC_BOTH
are (as with Andy's patch).  Moderately slower, still ;)

So, changing the current backward matching to forward
matching should is effective.

No, there is no such condition.  There are several ways to avoid
the duplicate DEC_POS, on being to handle the "pattern_len == 0"
case right at the top of the function, for all its branches.

The originally observed slowness was not because of the usage of
CHAR_TO_BYTE, but because of the flaws in CHAR_TO_BYTE, such as
using unrelated "best_below" and "best_above" in the same expression.

For the numbers, with my 100MB file test case:

backward search previously:
        14 .. 90 s (random)
backward search with fixed CHAR_TO_BYTE:
        5.6 s

I don't see any fix of CHAR_TO_BYTE in the current CVS
code.  Where is it?

Those tests were made with ad hoc modifications as needed. There
was also some code to measure the times, of course.

In any case, with some tweaking it is possible to improve both
directions by ~70% (that is down to about 1 sec for the test
case).  I still don't know why boyer_moore with a one-char
pattern takes only 0.5 seconds though.  It's amazingly fast.

Are you comparing both methods with the same value of

Same value, but not same search patterns.  One with "sharp s",
one without.

Actually I just wanted to check the facts with the originally in
this thread proposed "sharp s" patch, because some people wrote it
would be too slow.  FWIW I don't think it would be any problem.

Btw it seems that long loading time for the big file has much to
do with inefficient counting of newlines.  Appearently it takes
~2 sec to load the file and then another ~6 sec to scan newlines.
It should be (far) under 0.5 sec.

Why is the code of counting newlines called when we just
visit a file?

I have no idea why.  Opening the 100MB file would call scan_buffer
(for \n) 67637 times.  The file has 3142771 lines though, so I take
it back: it's probably not "counting newlines" in that sense.  Maybe
it comes from  "Loading cc-langs ..." which happens after the first
2 seconds.

--- grischka

Kenichi Handa

reply via email to

[Prev in Thread] Current Thread [Next in Thread]