[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug in regular expression \B using DFA
From: |
Aharon Robbins |
Subject: |
Re: Bug in regular expression \B using DFA |
Date: |
Wed, 30 Jul 2008 23:15:01 +0300 |
Greetings. Re this:
> From: "T. X. G." <address@hidden>
> Subject: Bug in regular expression \B using DFA
> Date: Wed, 16 Jul 2008 05:23:09 -0700 (PDT)
> To: address@hidden
>
> ~ gawk --version
> GNU Awk 3.1.6
> Copyright (C) 1989, 1991-2007 Free Software Foundation.
>
> ......
>
> You should have received a copy of the GNU General Public License
> along with this program. If not, see http://www.gnu.org/licenses/.
>
> ~ LC_ALL=C gawk 'BEGIN{x="abcd";gsub(/\B/,":",x);print x}'
> a:b:cd
>
> ~ LC_ALL=en_US.UTF-8 gawk 'BEGIN{x="abcd";gsub(/\B/,":",x);print x}'
> a:b:c:d
>
> ~ GAWK_NO_DFA=1 gawk 'BEGIN{x="abcd";gsub(/\B/,":",x);print x}'
> a:b:c:d
This is indeed a bug. Please apply the following patch. It will
make its way to CVS shortly.
Thanks
Arnold
------------------------------------
Wed Jul 30 23:10:51 2008 Arnold D. Robbins <address@hidden>
* re.c (research): Don't ever use DFA if need_start. It can
break on some weird cases. Reported by
"T. X. G." <address@hidden>.
--- re.c 11 Aug 2007 19:49:23 -0000 1.6
+++ re.c 30 Jul 2008 20:12:10 -0000
@@ -232,8 +232,11 @@
* focused, perhaps we should relegate the DFA matcher to the
* single byte case all the time. OTOH, the speed difference
* between the matchers in non-trivial... Sigh.)
+ *
+ * 7/2008: Simplify: skip dfa matcher if need_start. The above
+ * problems are too much to deal with.
*/
- if (rp->dfa && ! no_bol && (gawk_mb_cur_max == 1 || ! need_start)) {
+ if (rp->dfa && ! no_bol && ! need_start) {
char save;
int count = 0;
/*