|
From: | Wolfgang Laun |
Subject: | Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields |
Date: | Sun, 21 May 2017 21:46:26 +0200 |
Hi.
> Date: Sun, 21 May 2017 14:58:55 -0400
> From: "Andrew J. Schorr" <address@hiddeninvestments.com >
> To: Arnold Robbins <address@hidden>
> Cc: address@hidden
> Subject: Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields
>
> Hi,
>
> On Sun, May 21, 2017 at 09:52:49PM +0300, Arnold Robbins wrote:
> > You ask tough questions.
>
> :-) Sorry to be a pain.
No, it's good.
> > To me it seems obvious that if all the
> > requested data isn't there then the number of fields should be smaller,
> > allowing a check against what's expected to make it possible to weed
> > out bad data.
>
> I agree with you, but it is a change in behavior. I think it's probably
> safe to do this, but we should document how this stuff works.
Right.
> > It opens up the question of what if there is short data for an
> > individual field - FIELDWIDTHS says field 2 is 5 characters but only
> > three are there.
>
> Yes. Wolfgang's example is on point.
>
> > None of this is well defined in the documentation, nor, obviously
> > was it well thought out to start with. :-(
> >
> > I have written the code to handle the suggested '*' at the end to
> > mean "the rest of the record", which in and of itself is probably
> > a good idea.
>
> Agreed.
>
> > I guess I need to define what happens in these corner cases and
> > put it up for discussion here and then go with whatever seems to
> > make the most sense after the discussion is done. Sigh.
>
> I don't expect much disagreement over this, but one never knows. I think
> we should state clearly what we are doing, and then we should be OK.
> It seems clear that nobody has yet written code with FIELDWIDTHS that
> depends on the subtle NF behavior that you are discussing, so I doubt
> we will break anything.
>
> Regards,
> Andy
So here's my thoughts.
Q2/A2 are the biggest real open.
Arnold
------------------------------------------------------------ -----
Sun May 21 21:54:06 IDT 2017
============================
Some thoughts on better definitions of the behavior for FIELDWIDTHS.
Q1. Given FIELDWIDTHS = "2 3 4" and input data "aabb". How many fields
should there be?
A. Two, since that's all the data that's there
B. Three, with $3 == "", since it's supposed to be all fixed width data
A1. Gawk currently says three. Arnold leans towards two, since it reflects
the actual data and allows code expecting three fields to weed out
bad records.
Q2. Given FIELDWIDTHS = "2 3 4" and input data "aab", should $2 have a
value?
A. No - we're expecting three characters and they weren't all there
B. Yes - something was there, make it available
A2. Gawk currently says "yes". Arnold isn't sure what's right here.
Input is welcome.
Q3. Given FIELDWIDTHS = "2 3 4" and input data "aabbbccccddd" what should
be done with the dddd?
A. Nothing - it's extra, ignore it. NF should be set to 3. Code that
wants to know if there's something extra can use length() and
substr() to get it out of the record.
B. Stick it into $4 anyway.
A3. Arnold and gawk agree on (A).
Q4. Given the idea that using "*" at the end of FIELDWIDTHS to mean
anything else, then with FIELDWIDTHS = "2 3 4 *", and input
data "aabbbccccdddd" the dddd would go into $4. The final data
would be optional. Is there any reason not to add this to gawk?
It seems to be actually useful and not just theoretically useful.
A4. Arnold thinks it's right to add it.
[Prev in Thread] | Current Thread | [Next in Thread] |