bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: doc tweak re backslashes in bracket expressions


From: arnold
Subject: Re: doc tweak re backslashes in bracket expressions
Date: Sun, 03 Nov 2024 22:34:22 -0700
User-agent: Heirloom mailx 12.5 7/5/10

Hi Ed.

I am adding something to the manual. IMHO it's a bug in the standard
that \] isn't mentioned; I suggest opening a ticket on it.

Thanks,

Arnold

Ed Morton via "Bug reports only for gawk." <bug-gawk@gnu.org> wrote:

> I finally came across the POSIX reference that allows awk to interpret 
> ```\``` in a bracket expression as an escape character - it's in 
> https://pubs.opengroup.org/onlinepubs/9799919799/utilities/awk.html#tag_20_06_13_04:
>
> > these escape sequences shall be recognized both inside and outside 
> > bracket expressions.
> >
> > *Escape Sequence*
> >
> >     
> >
> > *Description*
> >
> >     
> >
> > *Meaning*
> >
> > \\
> >
> >     
> >
> > Two <backslash> characters.
> >
> >     
> >
> > In the lexical token *ERE*, the sequence shall represent itself. In 
> > the lexical token *STRING*, it shall represent a single <backslash>.
> >
> > \c
> >
> >     
> >
> > A <backslash> character followed by any character not described in 
> > this table or in the table in XBD /5. File Format Notation/ 
> > <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap05.html#tag_05>
> >  
> > ('\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v').
> >
> >     
> >
> > Undefined
> >
>
> so inside or outside of a bracket expression `\\` has to mean `\` and 
> the meaning of `\c` where `c` is any "ordinary character" is undefined 
> by POSIX and so gawk can treat it however it likes, hence allowing 
> `[a\]]` to mean "a or ]", for example. So, I'd consider that second part 
> more "allowed by" rather than "mandated by" POSIX (POSIX doesn't mandate 
> what `[a\]]` means to gawk) but maybe that's just nitpicking. I'd like 
> to see the doc add that reference though as it took me hours wading 
> through POSIX awk and regexp specs to find it.
>
>      Ed.
>
> On 11/3/2024 7:50 AM, Ed Morton via Bug reports only for gawk. wrote:
> > Just a small tweak suggestion for the gawk documentation regarding 
> > backslashes inside bracket expressions.
> >
> > https://www.gnu.org/software/gawk/manual/html_node/Bracket-Expressions.html 
> > currently says (**emphasis mine**):
> >
> >> The treatment of ‘\’ in bracket expressions is compatible with other 
> >> awk implementations **and is also mandated by POSIX**. 
> >
> > but POSIX, at least this 2024 incarnation of the spec, seems pretty 
> > clear (see references below*) that a backslash inside a bracket 
> > expression is not an escape character so per POSIX these would be 
> > compliant behavior:
> >
> >> $ printf 'a\\d\n' | grep -E '[\]'
> >> a\d
> >
> >> $ printf 'a\\d\n' | sed -En '/[\]/p'
> >> a\d
> >
> > while these would not:
> >
> >> $ printf 'a\\d\n' | awk '/[\]/'
> >> awk: cmd. line:1: /[\]/
> >> awk: cmd. line:1:  ^ unterminated regexp
> >
> >> $ printf 'a\\d\n' | awk --posix '/[\]/'
> >> awk: cmd. line:1: /[\]/
> >> awk: cmd. line:1:  ^ unterminated regexp
> >
> > so maybe either remove that "and is also mandated by POSIX" statement 
> > or provide a reference to where that behavior IS mandated by POSIX to 
> > clear up any confusion.
> >
> >     Ed.
> >
> > *From the current, 2024, POSIX regexp spec, 
> > https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html 
> > (**emphasis mine**):
> >
> >> > [9.1 Regular Expression 
> >> Definitions](https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_01)
> >>  
> >>
> >> > ...
> >> > escape sequence
> >> >
> >> > The escape character followed by any single character, which is
> >> > thereby "escaped". The escape character is a \<backslash\> that is
> >> > **neither in a bracket expression** nor itself escaped.
> >
> > which tells us that a backslash within a bracket expression is not an 
> > escape character, and this:
> >
> >> > [9.3.5 RE Bracket 
> >> Expression](https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_03_05)
> >> >
> >> > ... When the bracket
> >> > expression appears within an ERE, the special characters ... and 
> >> '```\```' (...
> >> > and \<backslash\>, respectively) shall **lose their special meaning 
> >> within
> >> > the bracket expression**
> >
> > which reiterates that a backslash within a bracket expression has no 
> > special meaning, and there's nothing I can see in [the POSIX awk 
> > spec](https://pubs.opengroup.org/onlinepubs/9799919799/utilities/awk.html) 
> > to override the above definitions.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]