help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Insertion of extra OFS character into output string


From: david kerns
Subject: Re: Insertion of extra OFS character into output string
Date: Mon, 13 Mar 2023 18:41:50 -0700

On Mon, Mar 13, 2023 at 5:59 PM H <agents@meddatainc.com> wrote:

> On March 14, 2023 12:41:16 AM GMT+01:00, "Neil R. Ormos" <
> ormos-gnulists17@ormos.org> wrote:
> >H wrote:
> >
> >> I am a newcomer to awk and have run into an
> >> issue I have not figured out yet... My platform
> >> is CentOS 7 running awk 4.0.2, the default
> >> version.
> >
> >> The following awk statement generates an extra
> >> tab character between fields 1 and 2, regardless
> >> of the data in the file:
> >
> >> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1;
> >gsub(/"/, ""); print}' somefile.csv
> >
> >> If i change the statement to:
> >
> >> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$2=$2;
> >gsub(/"/, ""); print}' somefile.csv
> >
> >> an extra OFS character is inserted between
> >> fields two and three. I can add that removing
> >> the gsub() in either of the two examples does
> >> not affect the results.
> >
> >> Might this be a bug in 4.0.2 or a feature I have
> >> not yet understood?
> >
> >I don't have 4.0.2 available to test, but I tested with older and newer
> >versions.
> >
> >When I test, I get the result I think I expect from the code you
> >posted.
> >
> >Also, setting FPAT overrides the effect of having earlier set FS.  (I
> >believe that the most-recently set one among FS, FPAT, and FIELDWIDTHS
> >controls the field splitting operation.)
> >
> >echo "1,2" | awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"}
> >{$1=$1; print}' | hexdump -c
> >0000000   1  \t   2  \n
> >0000004
> >
> >It would be easier to help if you would please provide:
> >
> >  the simplest input line that reproduces the problem;
> >
> >  the output you expect; and
> >
> >  the output you are getting.
>
> I am not on my computer but typing this on my phone. With that caveat, a
> /minimal/ example would be:
> echo "Alpha,Beta,Charlie,Delta" | awk 'BEGIN{FS=",";
> FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}'
>
> I would expect to see:
> Alpha<TAB>Beta<TAB>Charlie<TAB>Delta
> but instead see
> Alpha<TAB><TAB>Beta<TAB>Charlie<TAB>Delta
>
> If you change $1=$1 to $2=$2 you will find that the extra tab character
> then moves to the next field.
>
> I believe I had also tried without the definition of FS with the same
> result.
>
> Finally, note that the FPAT expression comes from the awk documentation
> and is thus expected to work.
>
> Can anyone try this with the most recent version of awk?
>

I think there is a bug here: (I fixed your FPAT, but that issue is
unrelated to what you're reporting)
$ cat somefile.csv
1,"this field, has a comma",3,4
$ cat p11
 gawk 'BEGIN {
        FPAT="[^,]*|[\"][^\"]+[\"]"
        OFS="\t"
}
        {
        for (i = 1; i <= NF; i++) x=$i # if you comment this line out,
you'll get the extra tab on output
        $1=$1;
        gsub(/"/, "");
        print
}' somefile.csv
$ ./bash pp11 | xxd
0000000: 3109 7468 6973 2066 6965 6c64 2c20 6861  1.this field, ha
0000010: 7320 6120 636f 6d6d 6109 3309 340a       s a comma.3.4.

however, it does seemed to be fixed in 5.2.1


reply via email to

[Prev in Thread] Current Thread [Next in Thread]