bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: doc tweak re backslashes in bracket expressions


From: Ed Morton
Subject: Re: doc tweak re backslashes in bracket expressions
Date: Sun, 3 Nov 2024 08:11:28 -0600
User-agent: Mozilla Thunderbird

I finally came across the POSIX reference that allows awk to interpret ```\``` in a bracket expression as an escape character - it's in https://pubs.opengroup.org/onlinepubs/9799919799/utilities/awk.html#tag_20_06_13_04:

these escape sequences shall be recognized both inside and outside bracket expressions.

*Escape Sequence*

        

*Description*

        

*Meaning*

\\

        

Two <backslash> characters.

        

In the lexical token *ERE*, the sequence shall represent itself. In the lexical token *STRING*, it shall represent a single <backslash>.

\c

        

A <backslash> character followed by any character not described in this table or in the table in XBD /5. File Format Notation/ <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap05.html#tag_05> ('\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v').

        

Undefined


so inside or outside of a bracket expression `\\` has to mean `\` and the meaning of `\c` where `c` is any "ordinary character" is undefined by POSIX and so gawk can treat it however it likes, hence allowing `[a\]]` to mean "a or ]", for example. So, I'd consider that second part more "allowed by" rather than "mandated by" POSIX (POSIX doesn't mandate what `[a\]]` means to gawk) but maybe that's just nitpicking. I'd like to see the doc add that reference though as it took me hours wading through POSIX awk and regexp specs to find it.

    Ed.

On 11/3/2024 7:50 AM, Ed Morton via Bug reports only for gawk. wrote:
Just a small tweak suggestion for the gawk documentation regarding backslashes inside bracket expressions.

https://www.gnu.org/software/gawk/manual/html_node/Bracket-Expressions.html currently says (**emphasis mine**):

The treatment of ‘\’ in bracket expressions is compatible with other awk implementations **and is also mandated by POSIX**.

but POSIX, at least this 2024 incarnation of the spec, seems pretty clear (see references below*) that a backslash inside a bracket expression is not an escape character so per POSIX these would be compliant behavior:

$ printf 'a\\d\n' | grep -E '[\]'
a\d

$ printf 'a\\d\n' | sed -En '/[\]/p'
a\d

while these would not:

$ printf 'a\\d\n' | awk '/[\]/'
awk: cmd. line:1: /[\]/
awk: cmd. line:1:  ^ unterminated regexp

$ printf 'a\\d\n' | awk --posix '/[\]/'
awk: cmd. line:1: /[\]/
awk: cmd. line:1:  ^ unterminated regexp

so maybe either remove that "and is also mandated by POSIX" statement or provide a reference to where that behavior IS mandated by POSIX to clear up any confusion.

    Ed.

*From the current, 2024, POSIX regexp spec, https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html (**emphasis mine**):

> [9.1 Regular Expression Definitions](https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_01)
> ...
> escape sequence
>
> The escape character followed by any single character, which is
> thereby "escaped". The escape character is a \<backslash\> that is
> **neither in a bracket expression** nor itself escaped.

which tells us that a backslash within a bracket expression is not an escape character, and this:

> [9.3.5 RE Bracket Expression](https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_03_05)
>
> ... When the bracket
> expression appears within an ERE, the special characters ... and '```\```' (... > and \<backslash\>, respectively) shall **lose their special meaning within
> the bracket expression**

which reiterates that a backslash within a bracket expression has no special meaning, and there's nothing I can see in [the POSIX awk spec](https://pubs.opengroup.org/onlinepubs/9799919799/utilities/awk.html) to override the above definitions.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]