bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18266: handling bytes not part of the charset, and other garbage


From: Paul Eggert
Subject: bug#18266: handling bytes not part of the charset, and other garbage
Date: Thu, 11 Sep 2014 20:26:12 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1

Vincent Lefevre wrote:

ypig% LC_ALL=C locale charmap
ANSI_X3.4-1968

That may be what the 'locale' command says, but bytes with the top bit on are considered to be valid single-byte characters. There are no encoding errors. So, in that sense it's not strict ASCII.

the current behavior breaks the sometimes used "grep ." solution
to match non-empty lines.

"grep ." matches lines containing one or more characters. Encoding errors are not characters, at least, not as far as plain grep is concerned.

Perhaps PCRE is different, and if libpcre worked with encoding errors we could simply use its way of matching them. But there doesn't seem to be a safe way to do that.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]