bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug in gawk 3.1.0 regex code


From: Aharon Robbins
Subject: Re: bug in gawk 3.1.0 regex code
Date: Sun, 4 Aug 2002 10:16:09 +0300

Greetings.  Re this, posted in May:

> From: address@hidden
> To: address@hidden
> Date: Fri, 10 May 2002 03:38:42 GMT+01:00
> Subject: bug in gawk 3.1.0 regex code
>
> I believe I've just found a bug in gawk3.1.0 implementation of
> extended regular expressions. It seems to be down to the alternation
> operator; when using an end anchor '$' as a subexpression in an
> alternation and the entire matched RE is a nul-string it fails
> to match the end of string, for example;
>
> gsub(/$|2/,"x")
> print
>
> input           = 12345
> expected output = 1x345x
> actual output   = 1x345
>
> The start anchor '^' always works as expected;
>
> gsub(/^|2/,"x")
> print
>
> input           = 12345
> expected output = x1x345
> actual output   = x1x345
>
> This was with POSIX compliance enabled althought that doesn't
> effect the result.
>
> I checked on gawk3.0.6 and got exactly the same results however
> gawk2.15.6 gives the expected results.
> [....]

I'm sorry it's taken so long to post an official reply. I
wanted to test all the various things that had been posted.

This is a bug in the implementation of gsub, and not in the 
actual regex routines.  The patch below fixes the problem.

By the way, re this:

> From: address@hidden (laura fairhead)
> Newsgroups: comp.lang.awk
> Subject: Re: bug in gawk3.1.0 regex code
> Date: Fri, 10 May 2002 02:09:44 GMT
> 
> I'm also investigating another possible problem with matching nul strings;
> 
> input    = 12345
> gsub(/2|/,"x")
> output   = x1x3x4x5x
> expected = x1xx3x4x5x
> [....]

Your `expected' is incorrect.  Matched text is always as long as
possible.  Thus, given a choice between the empty string and the
non-empty "2", it chooses the "2".

Thanks for finding this bug, and here's the patch.

Arnold
------------------- cut here ------------------------
*** ../gawk-3.1.1/builtin.c     Tue Apr 16 04:40:31 2002
--- builtin.c   Wed May 15 06:04:58 2002
***************
*** 1969,1977 ****
                        /*
                         * If the current match matched the null string,
                         * and the last match didn't and did a replacement,
!                        * then skip this one.
                         */
!                       if (lastmatchnonzero && matchstart == matchend) {
                                lastmatchnonzero = FALSE;
                                matches--;
                                goto empty;
--- 1968,1980 ----
                        /*
                         * If the current match matched the null string,
                         * and the last match didn't and did a replacement,
!                        * and the match of the null string is at the front of
!                        * the text (meaning right after end of the previous
!                        * replacement), then skip this one.
                         */
!                       if (matchstart == matchend
!                           && lastmatchnonzero
!                           && matchstart == text) {
                                lastmatchnonzero = FALSE;
                                matches--;
                                goto empty;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]