bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug in gawk 3.1.0 regex code


From: laura_fairhead
Subject: bug in gawk 3.1.0 regex code
Date: Fri, 10 May 2002 03:38:42 GMT+01:00

I believe I've just found a bug in gawk3.1.0 implementation of
extended regular expressions. It seems to be down to the alternation
operator; when using an end anchor '$' as a subexpression in an
alternation and the entire matched RE is a nul-string it fails
to match the end of string, for example;

gsub(/$|2/,"x")
print

input           = 12345
expected output = 1x345x
actual output   = 1x345

The start anchor '^' always works as expected;

gsub(/^|2/,"x")
print

input           = 12345
expected output = x1x345
actual output   = x1x345

This was with POSIX compliance enabled althought that doesn't
effect the result.

I checked on gawk3.0.6 and got exactly the same results however
gawk2.15.6 gives the expected results.

All the follow platforms produced the same results;

gawk3.0.6 / Win98 / i386
gawk3.1.0 / Win98 / i386
gawk3.0.5 / Linux2.2.16 / i386

Complete test results were as follows;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex               input     expected  actual    bug?      
-------------------------------------------------------------
(^)                 12345     x12345    x12345              
($)                 12345     12345x    12345x              
(^)|($)             12345     x12345x   x12345x             
($)|(^)             12345     x12345x   x12345x             
2                   12345     1x345     1x345               
(^)|2               12345     x1x345    x1x345              
2|(^)               12345     x1x345    x1x345              
($)|2               12345     1x345x    1x345     **BUG**   
2|($)               12345     1x345x    1x345     **BUG**   
(2)|(^)             12345     x1x345    x1x345              
(^)|(2)             12345     x1x345    x1x345              
(2)|($)             12345     1x345x    1x345     **BUG**   
($)|(2)             12345     1x345x    1x345     **BUG**   
((2)|(^)).          12345     xx45      xx45                
((^)|(2)).          12345     xx45      xx45                
.((2)|($))          12345     x34x      x34x                
.(($)|(2))          12345     x34x      x34x                
(^)|6               12345     x12345    x12345              
6|(^)               12345     x12345    x12345              
($)|6               12345     12345x    12345x              
6|($)               12345     12345x    12345x              
2|6|(^)             12345     x1x345    x1x345              
2|(^)|6             12345     x1x345    x1x345              
6|2|(^)             12345     x1x345    x1x345              
6|(^)|2             12345     x1x345    x1x345              
(^)|6|2             12345     x1x345    x1x345              
(^)|2|6             12345     x1x345    x1x345              
2|6|($)             12345     1x345x    1x345     **BUG**   
2|($)|6             12345     1x345x    1x345     **BUG**   
6|2|($)             12345     1x345x    1x345     **BUG**   
6|($)|2             12345     1x345x    1x345     **BUG**   
($)|6|2             12345     1x345x    1x345     **BUG**   
($)|2|6             12345     1x345x    1x345     **BUG**   
2|4|(^)             12345     x1x3x5    x1x3x5              
2|(^)|4             12345     x1x3x5    x1x3x5              
4|2|(^)             12345     x1x3x5    x1x3x5              
4|(^)|2             12345     x1x3x5    x1x3x5              
(^)|4|2             12345     x1x3x5    x1x3x5              
(^)|2|4             12345     x1x3x5    x1x3x5              
2|4|($)             12345     1x3x5x    1x3x5     **BUG**   
2|($)|4             12345     1x3x5x    1x3x5     **BUG**   
4|2|($)             12345     1x3x5x    1x3x5     **BUG**   
4|($)|2             12345     1x3x5x    1x3x5     **BUG**   
($)|4|2             12345     1x3x5x    1x3x5     **BUG**   
($)|2|4             12345     1x3x5x    1x3x5     **BUG**   
x{0}((2)|(^))       12345     x1x345    x1x345              
x{0}((^)|(2))       12345     x1x345    x1x345              
x{0}((2)|($))       12345     1x345x    1x345     **BUG**   
x{0}(($)|(2))       12345     1x345x    1x345     **BUG**   
x*((2)|(^))         12345     x1x345    x1x345              
x*((^)|(2))         12345     x1x345    x1x345              
x*((2)|($))         12345     1x345x    1x345     **BUG**   
x*(($)|(2))         12345     1x345x    1x345     **BUG**   
x{0}^               12345     x12345    x12345              
x{0}$               12345     12345x    12345x              
(x{0}^)|2           12345     x1x345    x1x345              
(x{0}$)|2           12345     1x345x    1x345     **BUG**   
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Here's the test program I used, a few of the cases use ERE {n[,[m]]}
operators so need '-W posix', (although the same results minus
those tests came out without POSIX compliance enabled)

[ Invocation was 'gawk -W posix -f tregex.awk' ]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tregex.awk
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BEGIN{
print 
_=sprintf("%-20s%-10s%-10s%-10s%-10s\n","regex","input","expected","actual","bug?")
OFS="-"
$(length(_)+1)=""
print $0

while(getline <"testre.dat")
{
RE=$1;IN=$2;OUT=$3
$0=IN
gsub(RE,"x")
printf "%-20s%-10s%-10s%-10s%-10s\n",RE,IN,OUT,$0,$0==OUT?"":"**BUG**"
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is the test data file used;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
testre.dat
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(^)             12345           x12345
($)             12345           12345x
(^)|($)         12345           x12345x
($)|(^)         12345           x12345x
2               12345           1x345
(^)|2           12345           x1x345
2|(^)           12345           x1x345
($)|2           12345           1x345x
2|($)           12345           1x345x
(2)|(^)         12345           x1x345
(^)|(2)         12345           x1x345
(2)|($)         12345           1x345x
($)|(2)         12345           1x345x
((2)|(^)).      12345           xx45
((^)|(2)).      12345           xx45
.((2)|($))      12345           x34x
.(($)|(2))      12345           x34x
(^)|6           12345           x12345
6|(^)           12345           x12345
($)|6           12345           12345x
6|($)           12345           12345x
2|6|(^)         12345           x1x345
2|(^)|6         12345           x1x345
6|2|(^)         12345           x1x345
6|(^)|2         12345           x1x345
(^)|6|2         12345           x1x345
(^)|2|6         12345           x1x345
2|6|($)         12345           1x345x
2|($)|6         12345           1x345x
6|2|($)         12345           1x345x
6|($)|2         12345           1x345x
($)|6|2         12345           1x345x
($)|2|6         12345           1x345x
2|4|(^)         12345           x1x3x5
2|(^)|4         12345           x1x3x5
4|2|(^)         12345           x1x3x5
4|(^)|2         12345           x1x3x5
(^)|4|2         12345           x1x3x5
(^)|2|4         12345           x1x3x5
2|4|($)         12345           1x3x5x
2|($)|4         12345           1x3x5x
4|2|($)         12345           1x3x5x
4|($)|2         12345           1x3x5x
($)|4|2         12345           1x3x5x
($)|2|4         12345           1x3x5x
x{0}((2)|(^))   12345           x1x345
x{0}((^)|(2))   12345           x1x345
x{0}((2)|($))   12345           1x345x
x{0}(($)|(2))   12345           1x345x
x*((2)|(^))     12345           x1x345
x*((^)|(2))     12345           x1x345
x*((2)|($))     12345           1x345x
x*(($)|(2))     12345           1x345x
x{0}^           12345           x12345
x{0}$           12345           12345x
(x{0}^)|2       12345           x1x345
(x{0}$)|2       12345           1x345x
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I've attached a full copy of this e-mail in ZIP format
in case of e-mail transport errors corrupting the data.

I've posted the same bug report to gnu.utils.bug and
it's being discussed in this thread on comp.lang.awk;

From: address@hidden (laura fairhead)
Newsgroups: comp.lang.awk
Subject: bug in gawk3.1.0 regex code
Date: Wed, 08 May 2002 23:31:40 GMT
Message-ID: <address@hidden>


byefrom

Laura Fairhead




--------------------
talk21 your FREE portable and private address on the net at 
http://www.talk21.com

Attachment: COPY.ZIP
Description:


reply via email to

[Prev in Thread] Current Thread [Next in Thread]