help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Insertion of extra OFS character into output string


From: H
Subject: Re: Insertion of extra OFS character into output string
Date: Tue, 14 Mar 2023 15:09:56 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 03/14/2023 02:41 AM, david kerns wrote:
>
>
> On Mon, Mar 13, 2023 at 5:59 PM H <agents@meddatainc.com 
> <mailto:agents@meddatainc.com>> wrote:
>
>     On March 14, 2023 12:41:16 AM GMT+01:00, "Neil R. Ormos" 
> <ormos-gnulists17@ormos.org <mailto:ormos-gnulists17@ormos.org>> wrote:
>     >H wrote:
>     >
>     >> I am a newcomer to awk and have run into an
>     >> issue I have not figured out yet... My platform
>     >> is CentOS 7 running awk 4.0.2, the default
>     >> version.
>     >
>     >> The following awk statement generates an extra
>     >> tab character between fields 1 and 2, regardless
>     >> of the data in the file:
>     >
>     >> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1;
>     >gsub(/"/, ""); print}' somefile.csv
>     >
>     >> If i change the statement to:
>     >
>     >> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$2=$2;
>     >gsub(/"/, ""); print}' somefile.csv
>     >
>     >> an extra OFS character is inserted between
>     >> fields two and three. I can add that removing
>     >> the gsub() in either of the two examples does
>     >> not affect the results.
>     >
>     >> Might this be a bug in 4.0.2 or a feature I have
>     >> not yet understood?
>     >
>     >I don't have 4.0.2 available to test, but I tested with older and newer
>     >versions.
>     >
>     >When I test, I get the result I think I expect from the code you
>     >posted.
>     >
>     >Also, setting FPAT overrides the effect of having earlier set FS.  (I
>     >believe that the most-recently set one among FS, FPAT, and FIELDWIDTHS
>     >controls the field splitting operation.)
>     >
>     >echo "1,2" | awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"}
>     >{$1=$1; print}' | hexdump -c
>     >0000000   1  \t   2  \n
>     >0000004
>     >
>     >It would be easier to help if you would please provide:
>     >
>     >  the simplest input line that reproduces the problem;
>     >
>     >  the output you expect; and
>     >
>     >  the output you are getting.
>
>     I am not on my computer but typing this on my phone. With that caveat, a 
> /minimal/ example would be:
>     echo "Alpha,Beta,Charlie,Delta" | awk 'BEGIN{FS=","; 
> FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}'
>
>     I would expect to see:
>     Alpha<TAB>Beta<TAB>Charlie<TAB>Delta
>     but instead see
>     Alpha<TAB><TAB>Beta<TAB>Charlie<TAB>Delta
>
>     If you change $1=$1 to $2=$2 you will find that the extra tab character 
> then moves to the next field.
>
>     I believe I had also tried without the definition of FS with the same 
> result.
>
>     Finally, note that the FPAT expression comes from the awk documentation 
> and is thus expected to work.
>
>     Can anyone try this with the most recent version of awk?
>
>
> I think there is a bug here: (I fixed your FPAT, but that issue is unrelated 
> to what you're reporting)
> $ cat somefile.csv
> 1,"this field, has a comma",3,4
> $ cat p11
>  gawk 'BEGIN {
>         FPAT="[^,]*|[\"][^\"]+[\"]"
>         OFS="\t"
> }
>         {
>         for (i = 1; i <= NF; i++) x=$i # if you comment this line out, you'll 
> get the extra tab on output
>         $1=$1;
>         gsub(/"/, "");
>         print
> }' somefile.csv
> $ ./bash pp11 | xxd
> 0000000: 3109 7468 6973 2066 6965 6c64 2c20 6861  1.this field, ha
> 0000010: 7320 6120 636f 6d6d 6109 3309 340a       s a comma.3.4.
>
> however, it does seemed to be fixed in 5.2.1
>
>
Why the need to "fix" my FPAT? As I stated earlier, the FPAT I used is from the 
awk documentation.

Also, it is better to keep this discussion on the mailing list where it 
belongs, no need to pollute my personal email.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]