bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parse CVS in awk


From: Peter Brooks
Subject: Re: Parse CVS in awk
Date: Fri, 10 Apr 2020 05:52:47 +0100

You might find this a useful tool:

https://colin.maudry.com/csvtool-manual-page/

Sent from my iPad

> On 9 Apr 2020, at 18:53, Manuel Collado <address@hidden> wrote:
> 
> El 09/04/2020 a las 17:00, Manuel Collado escribió:
>>> El 09/04/2020 a las 4:51, Peng Yu escribió:
>>> I'm wondering if the solution mentioned here is robust against all CVS
>>> format variations.
>>> 
>>> https://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content
> 
> This manual says:
> 
> <quote>
> NOTE: Some programs export CSV data that contains embedded newlines between 
> the double quotes. gawk provides no way to deal with this. Even though a 
> formal specification for CSV data exists, there isn’t much more to be done; 
> the FPAT mechanism provides an elegant solution for the majority of cases, 
> and the gawk developers are satisfied with that.
> <endquote>
> 
> Well, there is a trick that can handle fields with embedded newlines. The 
> idea is to join lines until the number of quotes is an even number. And amend 
> NR and FNR if necessary:
> 
> # Process CSV input records with embedded newlines
> {
>    # Collect multi-line data, if it is the case
>    CSVRECORD = $0
>    while (gsub("\"", "\"", CSVRECORD) % 2 == 1 && (_csv_multi = getline 
> _csv_) > 0) {
>        CSVRECORD = CSVRECORD "\n" _csv_
>        NR--
>        FNR--
>    }
>    if (_csv_multi) {
>        $0 = CSVRECORD
>    }
> }
> 
> HTH.
> -- 
> Manuel Collado - http://mcollado.z15.es
> 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]