Hi Ed.
Thanks for the report. You have indeed found a buglet. The
fix is below. I will add this as a test case in the test suite.
Thanks,
Arnold
Ed Morton <address@hidden> wrote:
Setting RS to null in gawk (tested with version 4.1.4 on Mac and 5.0.1
on cygwin) seems to change how field splitting works with the default FS.
I understand this:
$ echo ' a b c ' | awk '{print NF, "<" $0 ":" RT ">"; for (i=1;
i<=NF; i++) print i, "[" $i "]"}'
3 < a b c :
>
1 [a]
2 [b]
3 [c]
because the default FS setting is causing leading/trailing white space
to be ignored when the record is split into fields but now look at this:
$ echo ' a b c ' | awk -v RS='' '{print NF, "<" $0 ":" RT ">"; for
(i=1; i<=NF; i++) print i, "[" $i "]"}'
4 < a b c :
>
1 [a]
2 [b]
3 [c]
4 []
Why is there a 4th field? I THINK it's a bug that in that 2nd script the
trailing white space is not ignored when the record is split into
fields. FWIW I tested that last script with OSX/BSD awk too and it did
strip off the trailing blank and leave 3 fields as I expected.
Ed.
------------------------------------
diff --git a/field.c b/field.c
index efbc7092..bae16e9c 100644
--- a/field.c
+++ b/field.c
@@ -463,7 +463,10 @@ re_parse_field(long up_to, /* parse only up to this field
number */
if (len == 0)
return nf;
+ bool default_field_splitting = false;
if (RS_is_null && default_FS) {
+ default_field_splitting = true;
+
sep = scan;
while (scan < end && (*scan == ' ' || *scan == '\t' || *scan ==
'\n'))
scan++;
@@ -504,7 +507,7 @@ re_parse_field(long up_to, /* parse only up to this field
number */
(long) (REEND(rp, scan) - RESTART(rp, scan)),
sep_arr);
scan += REEND(rp, scan);
field = scan;
- if (scan == end) /* FS at end of record */
+ if (scan == end && ! default_field_splitting) /* FS at end of
record */
(*set)(++nf, field, 0L, n);
}
if (nf != up_to && scan < end) {