[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Insertion of extra OFS character into output string
From: |
Andrew J. Schorr |
Subject: |
Re: Insertion of extra OFS character into output string |
Date: |
Mon, 13 Mar 2023 21:37:46 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Mon, Mar 13, 2023 at 09:10:28PM +0100, H wrote:
> I am a newcomer to awk and have run into an issue I have not figured out
> yet... My platform is CentOS 7 running awk 4.0.2, the default version.
>
> The following awk statement generates an extra tab character between fields 1
> and 2, regardless of the data in the file:
>
> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/,
> ""); print}' somefile.csv
>
> If i change the statement to:
>
> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$2=$2; gsub(/"/,
> ""); print}' somefile.csv
>
> an extra OFS character is inserted between fields two and three. I can add
> that removing the gsub() in either of the two examples does not affect the
> results.
>
> Might this be a bug in 4.0.2 or a feature I have not yet understood?
I think it is in fact a bug in 4.0.2:
bash-5.1$ ./gawk --version | head -1
GNU Awk 4.0.2
bash-5.1$ echo "Alpha,Beta,Charlie,Delta" | ./gawk 'BEGIN{FS=",";
FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
0000000 A l p h a \t \t B e t a \t C h a r
0000020 l i e \t D e l t a \n
0000032
I confirmed that the CentOS 7 gawk has this bug.
Compare to the current master branch:
bash-5.1$ echo "Alpha,Beta,Charlie,Delta" | ./gawk 'BEGIN{FS=",";
FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
0000000 A l p h a \t B e t a \t C h a r l
0000020 i e \t D e l t a \n
0000031
I think you have 3 options:
1. Install a newer version of gawk on your system.
2. Open a bug on Red Hat bugzilla and wait for them to patch it.
3. Upgrade to Rocky 8 or Rocky 9. :-)
I checked, and it's fixed in gawk 4.2.1 in Rocky 8:
bash-4.4$ ./usr/bin/gawk --version | head -1
GNU Awk 4.2.1, API: 2.0 (GNU MPFR 3.1.6-p2, GNU MP 6.1.2)
bash-4.4$ echo "Alpha,Beta,Charlie,Delta" | ./usr/bin/gawk 'BEGIN{FS=",";
FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
0000000 A l p h a \t B e t a \t C h a r l
0000020 i e \t D e l t a \n
0000031
Regards,
Andy