bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields


From: arnold
Subject: Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields
Date: Sun, 21 May 2017 21:02:33 -0600
User-agent: Heirloom mailx 12.4 7/29/08

The motivation was something like this, but I don't remember 100%.
I don't think padding with spaces make sense in awk, but I
appreciate the input.

Arnold

Wolfgang Laun <address@hidden> wrote:

> I haven't browsed through all the details in this discussion, but perhaps
> you consider the *origin *of fixed width fields, which before Unix creation
> time did not have to worry about text file lines being truncated by
> omitting trailing spaces. Those were the days when certain text files still
> typically had 80, perhaps 96 or even 132 characters and your (COBOL)
> structure would be defined by column or character counts. (This fixed field
> structure did survive in many applications, and perhaps that was the
> rationale at the time of the invention of awk to have this feature.) The
> traditional approach was, of course, to have trailing empty fields set to
> all spaces. Partially non-blank fields would have full length, padded with
> trailing spaces.
>
> If someone wants to have that old behaviour, omitting trailing fields
> altogether and truncating an incomplete field would create a nuisance.
> Announcing a rigid field structure using FIELDWIDTHS is (IMHO) the sign
> that just this traditional behaviour is desired.
>
> Cheers
> Wolfgang
>
>
>
>
>
>
> On 21 May 2017 at 21:12, Arnold Robbins <address@hidden> wrote:
>
> > Hi.
> >
> > > Date: Sun, 21 May 2017 14:58:55 -0400
> > > From: "Andrew J. Schorr" <address@hidden>
> > > To: Arnold Robbins <address@hidden>
> > > Cc: address@hidden
> > > Subject: Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields
> > >
> > > Hi,
> > >
> > > On Sun, May 21, 2017 at 09:52:49PM +0300, Arnold Robbins wrote:
> > > > You ask tough questions.
> > >
> > > :-) Sorry to be a pain.
> >
> > No, it's good.
> >
> > > > To me it seems obvious that if all the
> > > > requested data isn't there then the number of fields should be smaller,
> > > > allowing a check against what's expected to make it possible to weed
> > > > out bad data.
> > >
> > > I agree with you, but it is a change in behavior. I think it's probably
> > > safe to do this, but we should document how this stuff works.
> >
> > Right.
> >
> > > > It opens up the question of what if there is short data for an
> > > > individual field - FIELDWIDTHS says field 2 is 5 characters but only
> > > > three are there.
> > >
> > > Yes. Wolfgang's example is on point.
> > >
> > > > None of this is well defined in the documentation, nor, obviously
> > > > was it well thought out to start with. :-(
> > > >
> > > > I have written the code to handle the suggested '*' at the end to
> > > > mean "the rest of the record", which in and of itself is probably
> > > > a good idea.
> > >
> > > Agreed.
> > >
> > > > I guess I need to define what happens in these corner cases and
> > > > put it up for discussion here and then go with whatever seems to
> > > > make the most sense after the discussion is done. Sigh.
> > >
> > > I don't expect much disagreement over this, but one never knows. I think
> > > we should state clearly what we are doing, and then we should be OK.
> > > It seems clear that nobody has yet written code with FIELDWIDTHS that
> > > depends on the subtle NF behavior that you are discussing, so I doubt
> > > we will break anything.
> > >
> > > Regards,
> > > Andy
> >
> > So here's my thoughts.
> >
> > Q2/A2 are the biggest real open.
> >
> > Arnold
> > -----------------------------------------------------------------
> > Sun May 21 21:54:06 IDT 2017
> > ============================
> >
> > Some thoughts on better definitions of the behavior for FIELDWIDTHS.
> >
> > Q1. Given FIELDWIDTHS = "2 3 4" and input data "aabb". How many fields
> >    should there be?
> >    A. Two, since that's all the data that's there
> >    B. Three, with $3 == "", since it's supposed to be all fixed width data
> >
> > A1. Gawk currently says three. Arnold leans towards two, since it reflects
> >     the actual data and allows code expecting three fields to weed out
> >     bad records.
> >
> > Q2. Given FIELDWIDTHS = "2 3 4" and input data "aab", should $2 have a
> >     value?
> >     A. No - we're expecting three characters and they weren't all there
> >     B. Yes - something was there, make it available
> >
> > A2. Gawk currently says "yes".  Arnold isn't sure what's right here.
> >     Input is welcome.
> >
> > Q3. Given FIELDWIDTHS = "2 3 4" and input data "aabbbccccddd" what should
> >     be done with the dddd?
> >     A. Nothing - it's extra, ignore it. NF should be set to 3. Code that
> >        wants to know if there's something extra can use length() and
> >        substr() to get it out of the record.
> >     B. Stick it into $4 anyway.
> >
> > A3. Arnold and gawk agree on (A).
> >
> > Q4. Given the idea that using "*" at the end of FIELDWIDTHS to mean
> >     anything else, then with FIELDWIDTHS = "2 3 4 *", and input
> >     data "aabbbccccdddd" the dddd would go into $4. The final data
> >     would be optional.  Is there any reason not to add this to gawk?
> >     It seems to be actually useful and not just theoretically useful.
> >
> > A4. Arnold thinks it's right to add it.
> >
> >



reply via email to

[Prev in Thread] Current Thread [Next in Thread]