[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Plan for grep [bug-grep]

From: Tim Waugh
Subject: Re: Plan for grep [bug-grep]
Date: Tue, 8 Mar 2005 13:13:07 +0000
User-agent: Mutt/

On Tue, Mar 08, 2005 at 05:38:36AM -0500, Charles Levert wrote:

> BTW, is the assumption (in the current code)
> that any two corresponding uppercase and
> lowercase Unicode code points have the same
> UTF-8 octet length (or 8-bit code unit lenght)
> always a safe (secure) one?

Where do you see that assumption?  Is that assumption also in the
Fedora Core patched grep?

> Since performance is an issue, measuring it could
> be included in testing, as well as reporting
> serious discrepancies between the results of
> identical tests being performed under various
> different locales.

As part of 'make check'?  If that's what you mean, better make sure
not to use wall-clock time to measure against but 'user' as reported
by time(1)!

> The only danger I see in waiting to do this is
> that there seems to have been improvements in
> UTF-8 handling by glibc's regex code.  Maybe all
> the -i kludges are not even needed anymore.
> Maybe there are also performance issues (either
> way) with this.
> That's why I previously stated that I saw doing
> this as a priority:  other items are affected.

The undeniable improvements in the glibc regex code are very useful --
however, the current (unpatched) grep multibyte handling is flawed in
many more ways than you might guess, and *that* is the thing to fix
first when doing performance testing.  See


Attachment: pgpw4cWZJ_lrR.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]