[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parse CVS in awk
From: |
Manuel Collado |
Subject: |
Re: Parse CVS in awk |
Date: |
Thu, 9 Apr 2020 19:53:36 +0200 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 |
El 09/04/2020 a las 17:00, Manuel Collado escribió:
El 09/04/2020 a las 4:51, Peng Yu escribió:
I'm wondering if the solution mentioned here is robust against all CVS
format variations.
https://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content
This manual says:
<quote>
NOTE: Some programs export CSV data that contains embedded newlines
between the double quotes. gawk provides no way to deal with this. Even
though a formal specification for CSV data exists, there isn’t much more
to be done; the FPAT mechanism provides an elegant solution for the
majority of cases, and the gawk developers are satisfied with that.
<endquote>
Well, there is a trick that can handle fields with embedded newlines.
The idea is to join lines until the number of quotes is an even number.
And amend NR and FNR if necessary:
# Process CSV input records with embedded newlines
{
# Collect multi-line data, if it is the case
CSVRECORD = $0
while (gsub("\"", "\"", CSVRECORD) % 2 == 1 && (_csv_multi =
getline _csv_) > 0) {
CSVRECORD = CSVRECORD "\n" _csv_
NR--
FNR--
}
if (_csv_multi) {
$0 = CSVRECORD
}
}
HTH.
--
Manuel Collado - http://mcollado.z15.es
- Parse CVS in awk, Peng Yu, 2020/04/08
- Re: Parse CVS in awk, Wolfgang Laun, 2020/04/09
- Re: Parse CVS in awk, Manuel Collado, 2020/04/09
- Re: Parse CVS in awk,
Manuel Collado <=
- Re: Parse CVS in awk, Peter Brooks, 2020/04/10
- RE: Parse CVS in awk, Carl Friedberg, 2020/04/10
- Re: Parse CVS in awk, Manuel Collado, 2020/04/10
- Re: Parse CVS in awk, Peter Brooks, 2020/04/10
- RE: Parse CVS in awk, pjfarley3, 2020/04/11
- Re: Parse CVS in awk, Peter Brooks, 2020/04/11
Re: Parse CVS in awk, arnold, 2020/04/09