[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] example tweak in documentations
From: |
Aharon Robbins |
Subject: |
Re: [bug-gawk] example tweak in documentations |
Date: |
Tue, 07 Apr 2015 10:48:38 +0300 |
User-agent: |
Heirloom mailx 12.5 6/20/10 |
Hi Ed.
The doc discusses replacing + with *; the main thing you seem to be
pointing out is the use of * to allow empty quoted fields.
I don't know if this is worth the trouble, but I've made a note
in the doc to revisit this at some point.
Thanks,
Arnold
> Date: Fri, 20 Mar 2015 18:49:07 +0000 (UTC)
> From: Ed Morton <address@hidden>
> To: address@hidden
> Subject: [bug-gawk] example tweak in documentations
>
> The FPAT example used in:
>
> http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content
>
> is, I'm sure, used as the starting point for many people working on CSV
> files. It doesn't support empty fields, however, and with a small tweak
> it could. For example:
>
> $ cat file
> Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA
> Smith,John,"314 Pi Ave, IL",HisTown,HisState,,USA
>
> Notice that in the 2nd line the ZIP code (6th field) is not populated
> and here's what the FPAT value from the documentation does with that:
>
> $ cat tst1.awk
> BEGIN {
> FPAT = "([^,]+)|(\"[^\"]+\")"
> }
>
> {
> print "\nNF = ", NF
> for (i = 1; i <= NF; i++) {
> printf("$%d = <%s>\n", i, $i)
> }
> }
> $ awk -f tst1.awk file
>
> NF = 7
> $1 = <Robbins>
> $2 = <Arnold>
> $3 = <"1234 A Pretty Street, NE">
> $4 = <MyTown>
> $5 = <MyState>
> $6 = <12345-6789>
> $7 = <USA>
>
> NF = 6
> $1 = <Smith>
> $2 = <John>
> $3 = <"314 Pi Ave, IL">
> $4 = <HisTown>
> $5 = <HisState>
> $6 = <USA>
>
> i.e. it discards it completely. Now if we tweak the FPAT to just use
> `*` instead of `+` as the repetition metacharacter:
>
> $ cat tst2.awk
> BEGIN {
> FPAT = "([^,]*)|(\"[^\"]*\")"
> }
>
> {
> print "\nNF = ", NF
> for (i = 1; i <= NF; i++) {
> printf("$%d = <%s>\n", i, $i)
> }
> }
> $
> $ awk -f tst2.awk file
>
> NF = 7
> $1 = <Robbins>
> $2 = <Arnold>
> $3 = <"1234 A Pretty Street, NE">
> $4 = <MyTown>
> $5 = <MyState>
> $6 = <12345-6789>
> $7 = <USA>
>
> NF = 7
> $1 = <Smith>
> $2 = <John>
> $3 = <"314 Pi Ave, IL">
> $4 = <HisTown>
> $5 = <HisState>
> $6 = <>
> $7 = <USA>
>
> it handles it correctly. I know this is just an FPAT example and as
> such doesn't need to be perfect handle all cases but I think given this
> is probably being copy/pasted into a lot of scripts and it's a trivial
> tweak to fix it, it might be worth doing.
>
> Ed.
- Re: [bug-gawk] example tweak in documentations,
Aharon Robbins <=