[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-grep] [patch #3803] Red Hat's "egf-speedup" patch

From: Tim Waugh
Subject: [bug-grep] [patch #3803] Red Hat's "egf-speedup" patch
Date: Thu, 28 Apr 2005 12:16:28 +0000
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.7) Gecko/20050416 Fedora/1.0.3-1.3.1 Firefox/1.0.3

Follow-up Comment #2, patch #3803 (project grep):

The full story behind this patch is that grep-2.5.1a does not handle UTF-8
gracefully at all.  The basic plan with handling UTF-8 in 2.5.1a is:
 * whenever a buffer is parsed, go through the entire buffer deciding how
many bytes make up each character
 * use this information when necessary

This patch changes that to:
 * when information about how many bytes make up a character is needed, work
it out on demand

On the face of it, this is a small obvious improvement.  In fact it is much
better than that, because the original scheme would calculate character
lengths several times for each buffer: in fact, one full pass for every
single potential match!

For a full discussion of this patch, as well as dfa-optional, including
benchmarking results, see the mailing list.


Reply to this item at:


  Message sent via/by Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]