help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Insertion of extra OFS character into output string


From: H
Subject: Re: Insertion of extra OFS character into output string
Date: Tue, 14 Mar 2023 15:12:02 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 03/14/2023 02:37 AM, Andrew J. Schorr wrote:
> On Mon, Mar 13, 2023 at 09:10:28PM +0100, H wrote:
>> I am a newcomer to awk and have run into an issue I have not figured out 
>> yet... My platform is CentOS 7 running awk 4.0.2, the default version.
>>
>> The following awk statement generates an extra tab character between fields 
>> 1 and 2, regardless of the data in the file:
>>
>> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, 
>> ""); print}' somefile.csv
>>
>> If i change the statement to:
>>
>> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$2=$2; gsub(/"/, 
>> ""); print}' somefile.csv
>>
>> an extra OFS character is inserted between fields two and three. I can add 
>> that removing the gsub() in either of the two examples does not affect the 
>> results.
>>
>> Might this be a bug in 4.0.2 or a feature I have not yet understood?
> I think it is in fact a bug in 4.0.2:
>
> bash-5.1$ ./gawk --version | head -1
> GNU Awk 4.0.2
>
> bash-5.1$ echo "Alpha,Beta,Charlie,Delta" | ./gawk 'BEGIN{FS=","; 
> FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
> 0000000   A   l   p   h   a  \t  \t   B   e   t   a  \t   C   h   a   r
> 0000020   l   i   e  \t   D   e   l   t   a  \n
> 0000032
>
> I confirmed that the CentOS 7 gawk has this bug.
>
> Compare to the current master branch:
>
> bash-5.1$ echo "Alpha,Beta,Charlie,Delta" | ./gawk 'BEGIN{FS=","; 
> FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
> 0000000   A   l   p   h   a  \t   B   e   t   a  \t   C   h   a   r   l
> 0000020   i   e  \t   D   e   l   t   a  \n
> 0000031
>
> I think you have 3 options:
> 1. Install a newer version of gawk on your system.
> 2. Open a bug on Red Hat bugzilla and wait for them to patch it.
> 3. Upgrade to Rocky 8 or Rocky 9. :-)
> I checked, and it's fixed in gawk 4.2.1 in Rocky 8:
>
> bash-4.4$ ./usr/bin/gawk --version | head -1
> GNU Awk 4.2.1, API: 2.0 (GNU MPFR 3.1.6-p2, GNU MP 6.1.2)
>
> bash-4.4$ echo "Alpha,Beta,Charlie,Delta" | ./usr/bin/gawk 'BEGIN{FS=","; 
> FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
> 0000000   A   l   p   h   a  \t   B   e   t   a  \t   C   h   a   r   l
> 0000020   i   e  \t   D   e   l   t   a  \n
> 0000031
>
> Regards,
> Andy

Thank you for researching this. This machine is not slated to be upgraded at 
this time. Is there a newer version of awk for CentOS 7 available somewhere 
else?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]