[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] A CSV Standard
From: |
David Jordan |
Subject: |
Re: [bug-gawk] A CSV Standard |
Date: |
Tue, 18 Nov 2014 22:38:10 -0000 |
I would be happy to volunteer to write it as I have been wanting to
contribute to a free software project for a while and it seems a simple
enough task (always dangerous to say). Do you think it would be better off
standalone or as part of gawketxtlib?
-----Original Message-----
From: Andrew J. Schorr [mailto:address@hidden
Sent: 18 November 2014 19:49
To: Aharon Robbins
Cc: address@hidden; address@hidden
Subject: Re: [bug-gawk] A CSV Standard
On Tue, Nov 18, 2014 at 09:28:49PM +0200, Aharon Robbins wrote:
> Thanks for the note. I wasn't aware of this RFC. I'll update the
> manual in the next day or two.
I never noticed this section of the manual before. Doesn't this FPAT
solution break for fields that contain a mix of embedded quotes and commas?
For example:
bash-4.2$ cat /tmp/bad.csv
f1,f2,f3,f4,f5
"a","b","c","this one has a quote "" inside, and also a comma","d"
bash-4.2$ cat /tmp/simple-csv.awk
BEGIN {
FPAT = "([^,]+)|(\"[^\"]+\")"
}
{
print "NF = ", NF
for (i = 1; i <= NF; i++) {
printf("$%d = <%s>\n", i, $i)
}
}
bash-4.2$ gawk -f /tmp/simple-csv.awk /tmp/bad.csv NF = 5
$1 = <f1>
$2 = <f2>
$3 = <f3>
$4 = <f4>
$5 = <f5>
NF = 6
$1 = <"a">
$2 = <"b">
$3 = <"c">
$4 = <"this one has a quote "" inside>
$5 = < and also a comma">
$6 = <"d">
I wonder if we might need an extension to provide a CSV input parser to
handle this properly.
Regards,
Andy