bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields


From: Arnold Robbins
Subject: Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields
Date: Sun, 21 May 2017 21:52:49 +0300
User-agent: Heirloom mailx 12.5 6/20/10

Hi.

> Date: Sun, 21 May 2017 12:29:34 -0400
> From: "Andrew J. Schorr" <address@hidden>
> To: Arnold Robbins <address@hidden>
> Cc: address@hidden
> Subject: Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields
>
> ......
>
> The patch looks fine to me, although I wonder whether this is really a bug. 
> The
> user specified that this is a field of fixed-width records, and we properly
> give empty string values for the missing fields. Why was it specified as a
> fixed-width record using the FIELDWIDTHS mechanism if that's not actually the
> case? I don't really know what NF is supposed to be in such cases. Is that
> defined in the docs? Will it break any existing scripts to change this
> behavior? Do we need to update the docs to define clearly what happens when 
> the
> input record is shorter than expected from the value implied by FIELDWIDTHS?
> And then there's the related question of what should happen when the record
> is longer than the value implied by FIELDWIDTHS? This also relates to the
> suggestion of adding a "*" special character for parsing extra data.
> In other words, this issue seems like a can of worms to me.

You ask tough questions.  To me it seems obvious that if all the
requested data isn't there then the number of fields should be smaller,
allowing a check against what's expected to make it possible to weed
out bad data.

It opens up the question of what if there is short data for an
individual field - FIELDWIDTHS says field 2 is 5 characters but only
three are there.

None of this is well defined in the documentation, nor, obviously
was it well thought out to start with. :-(

I have written the code to handle the suggested '*' at the end to
mean "the rest of the record", which in and of itself is probably
a good idea.

I guess I need to define what happens in these corner cases and
put it up for discussion here and then go with whatever seems to
make the most sense after the discussion is done. Sigh.

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]