[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
dfa.h / dfa.c diff versus gawk attached
From: |
Aharon Robbins |
Subject: |
dfa.h / dfa.c diff versus gawk attached |
Date: |
Thu, 06 Sep 2007 17:54:38 +0300 |
User-agent: |
Mutt/1.5.14 (2007-02-12) |
Greetings.
Attached is a diff of the grep 2.5.3 dfa.h and dfa.c against the current
version of same in the gawk CVS. (Or, it'll be in CVS within an hour or
so. :-)
The changes fall into two categories: bug fixes, mostly having to do
with multibyte character sets, and reviving the DFA matcher's ability
to match across newlines, which grep doesn't need but which gawk does.
This latter changes the interface to dfaexec.
I believe that the grep developers have had most of these changes in the
pipeline for a while, but I thought it wouldn't hurt to submit a fresh
set of diffs.
One new thing is that I have added the ability to let the caller of
the dfa routines know that the matcher is broken in certain cases. The
only case I know of at the moment is
(foo){0}
(foo){0,0}
which the DFA matcher treats as (foo){1} whereas regex correctly does
not match "foo". This is a problem in the DFA parsing as it builds the
parse tree that represents the DFA ... I could not see how to work
around it there, or anywhere else in the code. (Fixes welcome!)
It remains my hope that "one day" the grep distribution will return
to being the canonical source for dfa.h and dfa.c, and that I can
synchronize from it (as I do with GLIBC, for example) rather than the
other way around.
Thanks,
Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
grep-diff
Description: Text document
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- dfa.h / dfa.c diff versus gawk attached,
Aharon Robbins <=