help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Insertion of extra OFS character into output string


From: Andrew J. Schorr
Subject: Re: Insertion of extra OFS character into output string
Date: Mon, 13 Mar 2023 21:37:46 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Mar 13, 2023 at 09:10:28PM +0100, H wrote:
> I am a newcomer to awk and have run into an issue I have not figured out 
> yet... My platform is CentOS 7 running awk 4.0.2, the default version.
> 
> The following awk statement generates an extra tab character between fields 1 
> and 2, regardless of the data in the file:
> 
> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, 
> ""); print}' somefile.csv
> 
> If i change the statement to:
> 
> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$2=$2; gsub(/"/, 
> ""); print}' somefile.csv
> 
> an extra OFS character is inserted between fields two and three. I can add 
> that removing the gsub() in either of the two examples does not affect the 
> results.
> 
> Might this be a bug in 4.0.2 or a feature I have not yet understood?

I think it is in fact a bug in 4.0.2:

bash-5.1$ ./gawk --version | head -1
GNU Awk 4.0.2

bash-5.1$ echo "Alpha,Beta,Charlie,Delta" | ./gawk 'BEGIN{FS=","; 
FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
0000000   A   l   p   h   a  \t  \t   B   e   t   a  \t   C   h   a   r
0000020   l   i   e  \t   D   e   l   t   a  \n
0000032

I confirmed that the CentOS 7 gawk has this bug.

Compare to the current master branch:

bash-5.1$ echo "Alpha,Beta,Charlie,Delta" | ./gawk 'BEGIN{FS=","; 
FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
0000000   A   l   p   h   a  \t   B   e   t   a  \t   C   h   a   r   l
0000020   i   e  \t   D   e   l   t   a  \n
0000031

I think you have 3 options:
1. Install a newer version of gawk on your system.
2. Open a bug on Red Hat bugzilla and wait for them to patch it.
3. Upgrade to Rocky 8 or Rocky 9. :-)
I checked, and it's fixed in gawk 4.2.1 in Rocky 8:

bash-4.4$ ./usr/bin/gawk --version | head -1
GNU Awk 4.2.1, API: 2.0 (GNU MPFR 3.1.6-p2, GNU MP 6.1.2)

bash-4.4$ echo "Alpha,Beta,Charlie,Delta" | ./usr/bin/gawk 'BEGIN{FS=","; 
FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
0000000   A   l   p   h   a  \t   B   e   t   a  \t   C   h   a   r   l
0000020   i   e  \t   D   e   l   t   a  \n
0000031

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]