On Fri, Mar 31, 2017 at 11:52:29PM -0500, Ed Morton wrote:
Is this a bug?
Yes. It is a regression in 4.1.4.
$ cat tst.awk
BEGIN { FPAT="[^,]*" }
{
print NF, $0
for (i=1;i<=NF;i++)
print "\t" i, "[" $i "]"
print ""
}
$ cat -v file.csv
,,3
,,3
$ awk -f tst.awk file.csv
3 ,,3
1 []
2 []
3 [3]
2 ,,3
1 []
2 [3]
Note that awk recognizes 3 fields in the first line but only 2 in
the second. If it's not a bug - what's causing that behavior?
This worked OK in 4.1.3, but is broken in 4.1.4. It is related to this
ChangeLog entry:
2015-09-18 Arnold D. Robbins <address@hidden>
* field.c (fpat_parse_field): Always use rp->non_empty instead
of only if in_middle. The latter can be true even if we've
already parsed part of the record. Thanks to Ed Morton
for the bug report.
diff --git a/field.c b/field.c
index 6a7c6b1..ed31098 100644
--- a/field.c
+++ b/field.c
@@ -1598,9 +1598,8 @@ fpat_parse_field(long up_to, /* parse only up to
this field number */
if (in_middle) {
regex_flags |= RE_NO_BOL;
- non_empty = rp->non_empty;
- } else
- non_empty = false;
+ }
+ non_empty = rp->non_empty;
eosflag = false;
need_to_set_sep = true;
Reversing this patch fixes the bug, but reintroduces the bug that
was fixed by this patch. :-) Here's the test case for that bug:
==> test/fpat5.awk <==
BEGIN {
FPAT = "([^,]*)|(\"[^\"]+\")"
OFS = ";"
}
p != 0 { print NF }
{ $1 = $1 ; print }
==> test/fpat5.in <==
"A","B","C"
==> test/fpat5.ok <==
"A";"B";"C"
*** fpat5.ok 2017-01-26 13:52:53.285369000 -0500
--- _fpat5 2017-04-01 09:55:20.122459000 -0400
***************
*** 1 ****
! "A";"B";"C"
--- 1 ----
! "A";;"B";"C"
Arnold?
Regards,
Andy