bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fix for dfa when mixed with libsigsegv on Windows


From: Jim Meyering
Subject: Re: fix for dfa when mixed with libsigsegv on Windows
Date: Tue, 06 Apr 2010 20:50:38 +0200

Aharon Robbins wrote:
> Hi. The Cygwin maintainer tells me that libsigsegv on Windows pulls in the
> dreaded <windows.h> header file which defines WCHAR as wchar_t, causing a
> conflict with the WCHAR in the enum in dfa.h.  I propose the following diff
> which compiles OK under Linux and moves the details into dfa.c.

Thanks for the patch.

Moving any not-necessarily-public symbol definitions out
of the public API is an improvement.

However, I had to make some changes:
  - also move the comment describing the enum
  - adjust, now that all #ifdef MBS_SUPPORT are "#if MBS_SUPPORT"

and I want to fill in the blank after "Reported by" in the commit log below.
Currently you're listed as the author.  Let me know if someone
else should be listed instead.

Arnold, if you're going to be contributing patches
to grep's dfa.[ch], please start using git to do so.
That means you'd follow the guidelines in the brand
spanking new HACKING file and send in "git format-patch" output.
Thus, I won't have to ask about authorship, and you'll presumably
take care to attribute the reporter:

  http://git.sv.gnu.org/cgit/grep.git/tree/HACKING


>From aa86f99aa99f01fd23911dd4290d4904a019dfca Mon Sep 17 00:00:00 2001
From: Aharon Robbins <address@hidden>
Date: Tue, 6 Apr 2010 20:20:53 +0200
Subject: [PATCH] build: avoid conflict with WCHAR definition from Cygwin's 
<windows.h>

* src/dfa.h (enum token): Remove the definition from this file.
Replace with a declaration and typedef.  Moved to ...
* src/dfa.c (enum token): ... here.
Reported by __________________.
---
 src/dfa.c |   96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/dfa.h |   98 +-----------------------------------------------------------
 2 files changed, 98 insertions(+), 96 deletions(-)

diff --git a/src/dfa.c b/src/dfa.c
index ca32b66..5a4b2e3 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -68,6 +68,102 @@
 # undef clrbit
 #endif

+/* The regexp is parsed into an array of tokens in postfix form.  Some tokens
+   are operators and others are terminal symbols.  Most (but not all) of these
+   codes are returned by the lexical analyzer. */
+enum token_enum
+{
+  END = -1,                    /* END is a terminal symbol that matches the
+                                  end of input; any value of END or less in
+                                  the parse tree is such a symbol.  Accepting
+                                  states of the DFA are those that would have
+                                  a transition on END. */
+
+  /* Ordinary character values are terminal symbols that match themselves. */
+
+  EMPTY = NOTCHAR,             /* EMPTY is a terminal symbol that matches
+                                  the empty string. */
+
+  BACKREF,                     /* BACKREF is generated by \<digit>; it
+                                  it not completely handled.  If the scanner
+                                  detects a transition on backref, it returns
+                                  a kind of "semi-success" indicating that
+                                  the match will have to be verified with
+                                  a backtracking matcher. */
+
+  BEGLINE,                     /* BEGLINE is a terminal symbol that matches
+                                  the empty string if it is at the beginning
+                                  of a line. */
+
+  ENDLINE,                     /* ENDLINE is a terminal symbol that matches
+                                  the empty string if it is at the end of
+                                  a line. */
+
+  BEGWORD,                     /* BEGWORD is a terminal symbol that matches
+                                  the empty string if it is at the beginning
+                                  of a word. */
+
+  ENDWORD,                     /* ENDWORD is a terminal symbol that matches
+                                  the empty string if it is at the end of
+                                  a word. */
+
+  LIMWORD,                     /* LIMWORD is a terminal symbol that matches
+                                  the empty string if it is at the beginning
+                                  or the end of a word. */
+
+  NOTLIMWORD,                  /* NOTLIMWORD is a terminal symbol that
+                                  matches the empty string if it is not at
+                                  the beginning or end of a word. */
+
+  QMARK,                       /* QMARK is an operator of one argument that
+                                  matches zero or one occurences of its
+                                  argument. */
+
+  STAR,                                /* STAR is an operator of one argument 
that
+                                  matches the Kleene closure (zero or more
+                                  occurrences) of its argument. */
+
+  PLUS,                                /* PLUS is an operator of one argument 
that
+                                  matches the positive closure (one or more
+                                  occurrences) of its argument. */
+
+  REPMN,                       /* REPMN is a lexical token corresponding
+                                  to the {m,n} construct.  REPMN never
+                                  appears in the compiled token vector. */
+
+  CAT,                         /* CAT is an operator of two arguments that
+                                  matches the concatenation of its
+                                  arguments.  CAT is never returned by the
+                                  lexical analyzer. */
+
+  OR,                          /* OR is an operator of two arguments that
+                                  matches either of its arguments. */
+
+  ORTOP,                       /* OR at the toplevel in the parse tree.
+                                  This is used for a boyer-moore heuristic. */
+
+  LPAREN,                      /* LPAREN never appears in the parse tree,
+                                  it is only a lexeme. */
+
+  RPAREN,                      /* RPAREN never appears in the parse tree. */
+
+#if MBS_SUPPORT
+  ANYCHAR,                     /* ANYCHAR is a terminal symbol that matches
+                                  any multibyte (or single byte) characters.
+                                 It is used only if MB_CUR_MAX > 1.  */
+
+  MBCSET,                      /* MBCSET is similar to CSET, but for
+                                  multibyte characters.  */
+
+  WCHAR,                       /* Only returned by lex.  wctok contains
+                                  the wide character representation.  */
+#endif /* MBS_SUPPORT */
+
+  CSET                         /* CSET and (and any value greater) is a
+                                  terminal symbol that matches any of a
+                                  class of characters. */
+};
+
 static void dfamust (struct dfa *dfa);
 static void regexp (int toplevel);

diff --git a/src/dfa.h b/src/dfa.h
index e0a575f..afa258b 100644
--- a/src/dfa.h
+++ b/src/dfa.h
@@ -46,102 +46,8 @@
 /* Sets of unsigned characters are stored as bit vectors in arrays of ints. */
 typedef int charclass[CHARCLASS_INTS];

-/* The regexp is parsed into an array of tokens in postfix form.  Some tokens
-   are operators and others are terminal symbols.  Most (but not all) of these
-   codes are returned by the lexical analyzer. */
-
-typedef enum
-{
-  END = -1,                    /* END is a terminal symbol that matches the
-                                  end of input; any value of END or less in
-                                  the parse tree is such a symbol.  Accepting
-                                  states of the DFA are those that would have
-                                  a transition on END. */
-
-  /* Ordinary character values are terminal symbols that match themselves. */
-
-  EMPTY = NOTCHAR,             /* EMPTY is a terminal symbol that matches
-                                  the empty string. */
-
-  BACKREF,                     /* BACKREF is generated by \<digit>; it
-                                  it not completely handled.  If the scanner
-                                  detects a transition on backref, it returns
-                                  a kind of "semi-success" indicating that
-                                  the match will have to be verified with
-                                  a backtracking matcher. */
-
-  BEGLINE,                     /* BEGLINE is a terminal symbol that matches
-                                  the empty string if it is at the beginning
-                                  of a line. */
-
-  ENDLINE,                     /* ENDLINE is a terminal symbol that matches
-                                  the empty string if it is at the end of
-                                  a line. */
-
-  BEGWORD,                     /* BEGWORD is a terminal symbol that matches
-                                  the empty string if it is at the beginning
-                                  of a word. */
-
-  ENDWORD,                     /* ENDWORD is a terminal symbol that matches
-                                  the empty string if it is at the end of
-                                  a word. */
-
-  LIMWORD,                     /* LIMWORD is a terminal symbol that matches
-                                  the empty string if it is at the beginning
-                                  or the end of a word. */
-
-  NOTLIMWORD,                  /* NOTLIMWORD is a terminal symbol that
-                                  matches the empty string if it is not at
-                                  the beginning or end of a word. */
-
-  QMARK,                       /* QMARK is an operator of one argument that
-                                  matches zero or one occurences of its
-                                  argument. */
-
-  STAR,                                /* STAR is an operator of one argument 
that
-                                  matches the Kleene closure (zero or more
-                                  occurrences) of its argument. */
-
-  PLUS,                                /* PLUS is an operator of one argument 
that
-                                  matches the positive closure (one or more
-                                  occurrences) of its argument. */
-
-  REPMN,                       /* REPMN is a lexical token corresponding
-                                  to the {m,n} construct.  REPMN never
-                                  appears in the compiled token vector. */
-
-  CAT,                         /* CAT is an operator of two arguments that
-                                  matches the concatenation of its
-                                  arguments.  CAT is never returned by the
-                                  lexical analyzer. */
-
-  OR,                          /* OR is an operator of two arguments that
-                                  matches either of its arguments. */
-
-  ORTOP,                       /* OR at the toplevel in the parse tree.
-                                  This is used for a boyer-moore heuristic. */
-
-  LPAREN,                      /* LPAREN never appears in the parse tree,
-                                  it is only a lexeme. */
-
-  RPAREN,                      /* RPAREN never appears in the parse tree. */
-
-#if MBS_SUPPORT
-  ANYCHAR,                     /* ANYCHAR is a terminal symbol that matches
-                                  any multibyte (or single byte) characters.
-                                 It is used only if MB_CUR_MAX > 1.  */
-
-  MBCSET,                      /* MBCSET is similar to CSET, but for
-                                  multibyte characters.  */
-
-  WCHAR,                       /* Only returned by lex.  wctok contains
-                                  the wide character representation.  */
-#endif /* MBS_SUPPORT */
-
-  CSET                         /* CSET and (and any value greater) is a
-                                  terminal symbol that matches any of a
-                                  class of characters. */
-} token;
+enum token_enum;
+typedef enum token_enum token;

 /* Sets are stored in an array in the compiled dfa; the index of the
    array corresponding to a given set token is given by SET_INDEX(t). */
--
1.7.0.4.552.gc303




reply via email to

[Prev in Thread] Current Thread [Next in Thread]