bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: minor documentation suggestion for FS values and "whitespace" in gen


From: arnold
Subject: Re: minor documentation suggestion for FS values and "whitespace" in general
Date: Tue, 31 Mar 2020 04:19:54 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Hi Ed.

I finally took a look at this.  I don't see a need for major changes in the
doc. If you look at node "Fields" it says pretty clearly:

        When @command{awk} reads an input record, the record is
        automatically @dfn{parsed} or separated by the @command{awk}
        utility into chunks called @dfn{fields}.  By default, fields
        are separated by @dfn{whitespace}, like words in a line.
        Whitespace in @command{awk} means any string of one or more
        spaces, TABs, or newlines; other characters that are considered
        whitespace by other languages (such as formfeed, vertical tab,
        etc.) are @emph{not} considered whitespace by @command{awk}.

The doc does not anywhere make a claim that the whitespace is related to the
regex character class [:space:] (which in fact, it is not), so I think this
was just your confusion.

Thanks,

Arnold

Ed Morton <address@hidden> wrote:

> I was just looking up which exact characters get included in the set of 
> field separators when FS is " " (the default value) and got confused by 
> this in the gawk documentation:
>
>     Class    Meaning
>     [:blank:]    Space and TAB characters
>     [:space:]    Space characters (these are: space, TAB, newline,
>     carriage return, formfeed and vertical tab)
>
>     FS == " "
>          Fields are separated by runs of *whitespace*. Leading and
>     trailing whitespace are ignored. This is the default.
>     /(bold added by me)/
>
> I took the last statement above to mean that FS would be the set of 
> characters defined by the [:space:] character class but it's not since 
> FS doesn't include carriage return (\r) nor vertical tab (\v) (I didn't 
> bother checking others)when FS is " ", neither is it the [:blank:] 
> character class since it includes newlines (\n). Instead it seems to be 
> [:blank:] plus newline and that's supported by the POSIX spec if we 
> assume by <blank> they mean [:blank:]:
>
>     ...by default, a field is a string of non- <blank> non- <newline>
>     characters.
>
> But what does newline mean in all of the above? Is it always linefeed 
> (\n) on all platforms or is it LF (\n) on UNIX and CRLF (\r\n) on 
> Windows or something else? I really don't know.
>
> So - maybe you could update the documentation to say "Fields are 
> separated by runs of the whitespace (i.e. [:blank:] plus linefeed 
> characters)" or similar? I couldn't find anywhere in the documentation 
> that states exactly which characters  FS includes when assigned " " nor 
> what exactly is meant by "whitespace" throughout the documentation and I 
> think that one tweak to provide a clear definition of the term 
> "whitespace" would clarify all of it.
>
>      Ed.
>
>
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]