[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] A CSV Standard
From: |
Andrew J. Schorr |
Subject: |
Re: [bug-gawk] A CSV Standard |
Date: |
Tue, 18 Nov 2014 14:48:58 -0500 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Tue, Nov 18, 2014 at 09:28:49PM +0200, Aharon Robbins wrote:
> Thanks for the note. I wasn't aware of this RFC. I'll update the
> manual in the next day or two.
I never noticed this section of the manual before. Doesn't
this FPAT solution break for fields that contain a mix of
embedded quotes and commas? For example:
bash-4.2$ cat /tmp/bad.csv
f1,f2,f3,f4,f5
"a","b","c","this one has a quote "" inside, and also a comma","d"
bash-4.2$ cat /tmp/simple-csv.awk
BEGIN {
FPAT = "([^,]+)|(\"[^\"]+\")"
}
{
print "NF = ", NF
for (i = 1; i <= NF; i++) {
printf("$%d = <%s>\n", i, $i)
}
}
bash-4.2$ gawk -f /tmp/simple-csv.awk /tmp/bad.csv
NF = 5
$1 = <f1>
$2 = <f2>
$3 = <f3>
$4 = <f4>
$5 = <f5>
NF = 6
$1 = <"a">
$2 = <"b">
$3 = <"c">
$4 = <"this one has a quote "" inside>
$5 = < and also a comma">
$6 = <"d">
I wonder if we might need an extension to provide a CSV input parser to handle
this properly.
Regards,
Andy