bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] 4.7 Defining Fields by Content


From: Manuel Collado
Subject: Re: [bug-gawk] 4.7 Defining Fields by Content
Date: Mon, 21 Mar 2016 10:47:37 +0100
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Thunderbird/17.0

El 21/03/2016 5:25, Aharon Robbins escribió:
The cited RFC allows embedded newlines in fields; I think they have to be
inside quotes but am not sure.

Yes. They have to.


Date: Tue, 15 Mar 2016 08:09:54 +1000
From: Miriam English <address@hidden>
To: address@hidden
Subject: Re: [bug-gawk] 4.7 Defining Fields by Content

Is it "normal" for csv files to have embedded linefeeds? All the csv
files I've seen with special characters inside their fields have them
written as escaped codes (such as \t, \n, \f, and so on) which are
replaced with the actual characters on use.

Hello, Miriam.

I've never seen csv files with such kind of escapes. Can you provide a practical sample?

If raw control characters do
exist inside fields of csv files then wouldn't a pass through to convert
them to escaped codes solve that problem?

I've seen real-life csv data files with unescaped tab characters.


Andrew J. Schorr wrote:
On Mon, Mar 14, 2016 at 09:40:14AM +0100, Marco Coletti wrote:
This is just short of what is needed to correctly parse RFC 4180
formatted data, in that it does not account for double quotes
appearing as part of a field.

But even with the enhanced FPAT you propose, unless I'm confused,
it still won't work with records containing embedded linefeed
characters. We have discussed in the past developing a CSV
input parser extension, but nobody has implemented it yet.
If you'd like to develop it, we would welcome the contribution
of such an extension, possibly for the gawkextlib project if not
appropriate for inclusion in mainline gawk.

I've just started working on such a CSV extension. A generic CSV parser is now operational (this is the easiest part!). But don't expect to have anything available before one month or so (short of spare time, complexity of the extension API, autotools, gawkextlib organization, etc).

Regards.
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado




reply via email to

[Prev in Thread] Current Thread [Next in Thread]