bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: say if grep can find non-ascii


From: Julian Foad
Subject: Re: say if grep can find non-ascii
Date: Wed, 08 Mar 2006 12:22:48 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511

Paul Eggert wrote:
For this particular task "grep for non-ASCII characters", I had just
two days before tried to solve the same problem, and discovered that
'grep', somewhat to my surprise, can't do it.  This is worth either
mentioning or fixing, in my opinion.

According to the Open Group spec for Regular Expressions, which is a standard I assume we should generally be following,
<http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03_05>

The following character class expressions shall be supported in all locales:

    [:alnum:]   [:cntrl:]   [:lower:]   [:space:]
    [:alpha:]   [:digit:]   [:print:]   [:upper:]
    [:blank:]   [:graph:]   [:punct:]   [:xdigit:]

In addition, character class expressions of the form:

    [:name:]

are recognized in those locales where the name keyword has been given
a charclass definition in the LC_CTYPE category.

Therefore "grep '[^[:ascii:]]'" ought to work as expected iff the current locale defines that class.

Whether it DOES work is something I haven't tried to determine.

Whether Grep should support that class unconditionally, as Perl does, is another matter. I'd say probably not; there's probably a reason why it's not in the list of standard classes.

The Grep manual should be more explicit about the use of character classes other than those that it says are supported.

- Julian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]