[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 1/2] grep: diagnose and exit-2 for bogus REs like [:space:], [:di
From: |
Jim Meyering |
Subject: |
[PATCH 1/2] grep: diagnose and exit-2 for bogus REs like [:space:], [:digit:], etc. |
Date: |
Wed, 01 Sep 2010 19:09:30 +0200 |
I've just tweaked the documentation and pushed this:
>From adf0bf94a10d40f39d72663878a245163232aae6 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Tue, 31 Aug 2010 21:14:41 +0200
Subject: [PATCH 1/2] grep: diagnose and exit-2 for bogus REs like [:space:],
[:digit:], etc.
When I make a mistake like this:
grep '[:lower:]' ...
be it in a script or on the command line, I want to know about
it as soon as possible. I don't want grep to print a mere warning
that it is interpreting this suspicious and almost guaranteed-wrong
regular expression as a set of just 6 bytes. And I certainly don't
want grep to silently do the wrong thing, even if that would be
officially standards-conforming. It's obvious that I intended
[[:lower:]], and I want my error to be diagnosed in a way that is
most likely to get my attention. Thus, with this change, grep now
prints a diagnostic and exits with status 2 the moment it
encounters an offending [:char_class:] construct.
This changes the way grep works by default, rather than
putting this new behavior on an option. A new option
would seldom be used in scripts (not portable), and would
probably be used only rarely by those who need it the most.
This new functionality provides a valuable safety measure
and incurs truly negligible risk.
For strict POSIX compliance, set POSIXLY_CORRECT in
your environment. That disables this new feature.
Revert the changes from commit 2cd3bcea, "grep: add
--warnings={always,never,auto}.", and then do the following:
* src/dfasearch.c (dfawarn): Call getenv("POSIXLY_CORRECT") here;
Remove "warning: " from the diagnostic, now that it's more than
a warning, and exit with status 2.
* NEWS (New features): Describe the new semantics.
* tests/warn-char-classes: Adjust one test to accommodate this change.
* doc/grep.texi (Character Classes and Bracket Expressions): Document.
(Environment Variables): Cross-reference it.
Remove reference to obsolete getopt illegal vs. invalid difference.
Thanks to Paul Eggert for suggestions and an initial prod.
---
NEWS | 13 +++++----
doc/grep.texi | 15 ++++++----
src/dfasearch.c | 20 +++++++++-----
src/grep.h | 2 -
src/main.c | 67 +++++++----------------------------------------
tests/warn-char-classes | 6 +++-
6 files changed, 43 insertions(+), 80 deletions(-)
diff --git a/NEWS b/NEWS
index 553ec52..18289ba 100644
--- a/NEWS
+++ b/NEWS
@@ -29,12 +29,13 @@ GNU grep NEWS -*-
outline -*-
** New features
- grep now will warn for very common regular expression mistakes,
- such as using [:space:] instead of [[:space:]]. Warnings are
- disabled by POSIXLY_CORRECT. They are also disabled when stdout
- is not a TTY, thus minimizing the chance of extraneous output
- in a script. However, the rules for enabling/disabling warnings
- are experimental and subject to change in future releases.
+ grep now diagnoses (and fails with exit status 2) commonly mistyped
+ regular expression like [:space:], [:digit:], etc. Before, those were
+ silently interpreted as [ac:eps] and [dgit:] respectively. Virtually
+ all who make that class of mistake should have used [[:space:]] or
+ [[:digit:]]. This new behavior is disabled when the POSIXLY_CORRECT
+ environment variable is set.
+
* Noteworthy changes in release 2.6.3 (2010-04-02) [stable]
diff --git a/doc/grep.texi b/doc/grep.texi
index 890ba1b..da103c4 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -935,12 +935,8 @@ follow file names must be treated as file names;
by default,
such options are permuted to the front of the operand list
and are treated as options.
-Also,
address@hidden requires that unrecognized options be diagnosed as ``illegal'',
-but since they are not really against the law the default
-is to diagnose them as ``invalid''.
address@hidden also disables @address@hidden,
-described below.
+Also, @code{POSIXLY_CORRECT} disables special handling of an
+invalid bracket expression. @xref{invalid-bracket-expr}.
@item address@hidden
@vindex address@hidden @r{environment variable}
@@ -1269,6 +1265,13 @@ encoding, whereas the former is independent of locale
and character set.
part of the symbolic names, and must be included in addition to
the brackets delimiting the bracket expression.)
address@hidden
+If you mistakenly omit the outer brackets, and search for say,
@samp{[:upper:]},
+GNU @command{grep} prints a diagnostic and exits with status 2, on
+the assumption that you did not intend to search for the nominally
+equivalent regular expression: @samp{[:epru]}.
+Set the @code{POSIXLY_CORRECT} environment variable to disable this feature.
+
Most meta-characters lose their special meaning inside bracket expressions.
@table @samp
diff --git a/src/dfasearch.c b/src/dfasearch.c
index 29a1f7e..44eb08a 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -46,13 +46,6 @@ static struct patterns *patterns;
static size_t pcount;
void
-dfawarn (char const *mesg)
-{
- if (warnings)
- error (0, 0, _("warning: %s"), mesg);
-}
-
-void
dfaerror (char const *mesg)
{
error (EXIT_TROUBLE, 0, "%s", mesg);
@@ -62,6 +55,19 @@ dfaerror (char const *mesg)
abort ();
}
+/* For now, the sole dfawarn-eliciting condition (use of a regexp
+ like '[:lower:]') is unequivocally an error, so treat it as such,
+ when possible. */
+void
+dfawarn (char const *mesg)
+{
+ static enum { NONE = 0, POSIX, GNU } mode;
+ if (mode == NONE)
+ mode = (getenv ("POSIXLY_CORRECT") ? POSIX : GNU);
+ if (mode == GNU)
+ dfaerror (mesg);
+}
+
/* Number of compiled fixed strings known to exactly match the regexp.
If kwsexec returns < kwset_exact_matches, then we don't need to
call the regexp matcher at all. */
diff --git a/src/grep.h b/src/grep.h
index 64c1865..67ea793 100644
--- a/src/grep.h
+++ b/src/grep.h
@@ -43,5 +43,3 @@ extern int match_icase; /* -i */
extern int match_words; /* -w */
extern int match_lines; /* -x */
extern unsigned char eolbyte; /* -z */
-
-extern int warnings;
diff --git a/src/main.c b/src/main.c
index 6f5c4ae..0a91b7f 100644
--- a/src/main.c
+++ b/src/main.c
@@ -286,8 +286,7 @@ enum
LABEL_OPTION,
EXCLUDE_DIRECTORY_OPTION,
GROUP_SEPARATOR_OPTION,
- MMAP_OPTION,
- WARN_OPTION,
+ MMAP_OPTION
};
/* Long options equivalences. */
@@ -343,7 +342,6 @@ static struct option const long_options[] =
{"binary", no_argument, NULL, 'U'},
{"unix-byte-offsets", no_argument, NULL, 'u'},
{"version", no_argument, NULL, 'V'},
- {"warn", optional_argument, NULL, WARN_OPTION},
{"with-filename", no_argument, NULL, 'H'},
{"word-regexp", no_argument, NULL, 'w'},
{0, 0, 0, 0}
@@ -354,7 +352,6 @@ int match_icase;
int match_words;
int match_lines;
unsigned char eolbyte;
-int warnings;
/* For error messages. */
/* The name the program was run with, stripped of any leading path. */
@@ -388,27 +385,6 @@ static enum
SKIP_DEVICES
} devices = READ_DEVICES;
-enum warn_type
- {
- WARN_ALWAYS = 0,
- WARN_NEVER,
- WARN_AUTO,
- };
-
-static char const *const warn_args[] =
-{
- "always", "yes", "force",
- "never", "no", "none",
- "auto", "tty", "if-tty", NULL
-};
-static enum warn_type const warn_types[] =
-{
- WARN_ALWAYS, WARN_ALWAYS, WARN_ALWAYS,
- WARN_NEVER, WARN_NEVER, WARN_NEVER,
- WARN_AUTO, WARN_AUTO, WARN_AUTO
-};
-ARGMATCH_VERIFY (warn_args, warn_types);
-
static int grepdir (char const *, struct stats const *);
#if defined HAVE_DOS_FILE_CONTENTS
static inline int undossify_input (char *, size_t);
@@ -1783,7 +1759,6 @@ main (int argc, char **argv)
int opt, cc, status;
int default_context;
FILE *fp;
- enum warn_type w;
initialize_main (&argc, &argv);
set_program_name (argv[0]);
@@ -1816,8 +1791,6 @@ main (int argc, char **argv)
exit_failure = EXIT_TROUBLE;
atexit (close_stdout);
- w = getenv ("POSIXLY_CORRECT") ? WARN_NEVER : WARN_ALWAYS;
-
prepend_default_options (getenv ("GREP_OPTIONS"), &argc, &argv);
setmatcher (NULL);
@@ -2051,6 +2024,15 @@ main (int argc, char **argv)
show_help = 1;
} else
color_option = 2;
+ if (color_option == 2)
+ {
+ char const *t;
+ if (isatty (STDOUT_FILENO) && (t = getenv ("TERM"))
+ && !STREQ (t, "dumb"))
+ color_option = 1;
+ else
+ color_option = 0;
+ }
break;
case EXCLUDE_OPTION:
@@ -2098,13 +2080,6 @@ main (int argc, char **argv)
/* long options */
break;
- case WARN_OPTION:
- if (optarg)
- w = XARGMATCH ("--warn", optarg, warn_args, warn_types);
- else
- w = WARN_ALWAYS;
- break;
-
default:
usage (EXIT_TROUBLE);
break;
@@ -2127,28 +2102,6 @@ main (int argc, char **argv)
if (out_before < 0)
out_before = default_context;
- char const *t;
- if (isatty (STDOUT_FILENO)
- && (t = getenv ("TERM")) && !STREQ (t, "dumb"))
- {
- if (color_option == 2)
- color_option = 1;
- if (w == WARN_AUTO)
- w = WARN_ALWAYS;
- }
- else
- {
- if (color_option == 2)
- color_option = 0;
-
- /* Redirected stdout: we're likely in a script, disable warnings. */
- if (w == WARN_AUTO)
- w = WARN_NEVER;
- }
-
- /* assert (w == WARN_ALWAYS || w == WARN_NEVER); */
- warnings = w == WARN_ALWAYS;
-
if (color_option)
{
/* Legacy. */
diff --git a/tests/warn-char-classes b/tests/warn-char-classes
index 069489f..25bf640 100644
--- a/tests/warn-char-classes
+++ b/tests/warn-char-classes
@@ -7,12 +7,14 @@ set -x
echo f > x || framework_failure_
echo b >> x || framework_failure_
echo h >> x || framework_failure_
+printf 'grep: character class syntax is [[:space:]], not [:space:]\n' \
+ > exp-err || framework_failure_
# basic cases
grep '[:space:]' x 2> err
-test $? = 1 || fail=1
-test -s err || fail=1
+test $? = 2 || fail=1
+compare err exp-err || fail=1
grep '[[:space:]]' x 2> err
test $? = 1 || fail=1
--
1.7.2.2.510.g7180a
>From 85e0be54275366ade35333a69d5ba23064e5078f Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Wed, 1 Sep 2010 19:02:38 +0200
Subject: [PATCH 2/2] maint: add lib/version-etc.c to the list in POTFILES.in
* po/POTFILES.in: Add lib/version-etc.c.
---
po/POTFILES.in | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/po/POTFILES.in b/po/POTFILES.in
index ff8b148..15ded0c 100644
--- a/po/POTFILES.in
+++ b/po/POTFILES.in
@@ -22,6 +22,7 @@ lib/getopt.c
lib/obstack.c
lib/quotearg.c
lib/regcomp.c
+lib/version-etc.c
lib/xalloc-die.c
lib/xstrtol-error.c
src/dfa.c
--
1.7.2.2.510.g7180a
- [PATCH 1/2] grep: diagnose and exit-2 for bogus REs like [:space:], [:digit:], etc.,
Jim Meyering <=