|
From: | Manuel Collado |
Subject: | Re: [bug-gawk] 4.7 Defining Fields by Content |
Date: | Mon, 21 Mar 2016 10:47:37 +0100 |
User-agent: | Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Thunderbird/17.0 |
El 21/03/2016 5:25, Aharon Robbins escribió:
The cited RFC allows embedded newlines in fields; I think they have to be inside quotes but am not sure.
Yes. They have to.
Date: Tue, 15 Mar 2016 08:09:54 +1000 From: Miriam English <address@hidden> To: address@hidden Subject: Re: [bug-gawk] 4.7 Defining Fields by Content Is it "normal" for csv files to have embedded linefeeds? All the csv files I've seen with special characters inside their fields have them written as escaped codes (such as \t, \n, \f, and so on) which are replaced with the actual characters on use.
Hello, Miriam.I've never seen csv files with such kind of escapes. Can you provide a practical sample?
If raw control characters doexist inside fields of csv files then wouldn't a pass through to convert them to escaped codes solve that problem?
I've seen real-life csv data files with unescaped tab characters.
Andrew J. Schorr wrote:On Mon, Mar 14, 2016 at 09:40:14AM +0100, Marco Coletti wrote:This is just short of what is needed to correctly parse RFC 4180 formatted data, in that it does not account for double quotes appearing as part of a field.But even with the enhanced FPAT you propose, unless I'm confused, it still won't work with records containing embedded linefeed characters. We have discussed in the past developing a CSV input parser extension, but nobody has implemented it yet. If you'd like to develop it, we would welcome the contribution of such an extension, possibly for the gawkextlib project if not appropriate for inclusion in mainline gawk.
I've just started working on such a CSV extension. A generic CSV parser is now operational (this is the easiest part!). But don't expect to have anything available before one month or so (short of spare time, complexity of the extension API, autotools, gawkextlib organization, etc).
Regards. -- Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
[Prev in Thread] | Current Thread | [Next in Thread] |