bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep 2.5.1: NUL byte doesn't match a complemented character class


From: Joe Wells
Subject: Re: grep 2.5.1: NUL byte doesn't match a complemented character class
Date: Thu, 23 Aug 2007 14:06:14 +0100
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)

Jim Meyering <address@hidden> writes:

> Joe Wells <address@hidden> wrote:
>> I now can see what the difference between your environment and mine
>> must be.  I'm also using this environment variable setting:
>>
>>   LC_CTYPE=en_US.UTF-8
>>
>> When I change this (just for the “grep” process) to
>
> On some systems, the locale name is spelled slightly differently:
> [get the proper spelling from the output of "locale -a"]

(By the way, this is irrelevant to the bug in grep, but I believe the
output of “locale -a” does not give the officially correct locale
names.  On my system, it says my locale name is “en_US.utf8”.  My
understanding from reading the standards documents is that the
officially correct name is “en_US.UTF-8”.  The use of “utf8” occurs
because glibc has an internal compatibility hack where it downcases
the charset name and removes hyphens from it before looking up the
locale on disk and in data structures.)

> RHEL5 has the bug [rpm -q grep -> grep-2.5.1-52.2]:
>
>   $ printf '\0x' | LC_CTYPE=en_US.UTF-8 grep '[^x]x'
>   [Exit 1]
>
> Debian unstable seems not to have a problem:
>
>   $ printf '\0x' | LC_CTYPE=en_US.utf8 grep '[^x]x'
>   Binary file (standard input) matches

I'm glad you were able to reproduce the bug.  Can you tell if it is in
grep or the locales or glibc?

> I've Cc'd address@hidden, since that's the preferred bug-reporting
> address.

Then I have another bug to report.  The man page for “grep” on my
system (Ubuntu Dapper Drake) gives address@hidden as the only
bug reporting address.  (And there is no “grep.info” file installed.
Is there such a file?)  Is this a Debian/Ubuntu bug or a problem in
the original grep source?

>> I suppose the problem might be in glibc?  Or perhaps there is a bug in
>> the locale data files?
>>
>> (By the way, if any character in the input is being discarded for some
>> reason (e.g., invalid UTF-8 format), can I please ask that there
>> should be an error message generated by grep for this?  Otherwise
>> problems will be too difficult to track down.)
>>
>> By the way, I am using Ubuntu 6.06 LTS (“Dapper Drake”) with all
>
> I would consider upgrading.

Of course.  (But “LTS” is for “long term support”.  One of its main
advantages is not needing to upgrade.)

>>> [resending just to you, because your mail server blocked my first reply
>>>
>>>   <address@hidden>: host izanami.macs.hw.ac.uk[137.195.13.6] said: 550
>>>       82.230.74.64 is listed in rbl-plus.mail-abuse.ja.net (in reply to 
>>> RCPT TO
>>>       command)
>>>       ]
>>
>> Sorry about that!  I don't know why that RBL lists that IP address
>> (mx.meyering.net).  I'm glad I got your second e-mail.
>
> I suggest you tell the folks who administer that mail server
> that they are blocking non-spam mail from a static IP address that
> has been completely spam-free (and not an open relay, etc.) for
> more than three years.

Is your point that you think the RBL in question is unreliable and
shouldn't be used?  (I know RBLs in general are often controversial,
for exactly the reason we are seeing here.)

> In this case, I took the trouble to route
> mail for your domain through a different outbound server, but most
> correspondents getting such a bounce would not do that.

Thanks indeed!

-- 
Joe




reply via email to

[Prev in Thread] Current Thread [Next in Thread]