bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: doc tweak re backslashes in bracket expressions


From: Ed Morton
Subject: Re: doc tweak re backslashes in bracket expressions
Date: Mon, 4 Nov 2024 06:16:32 -0600
User-agent: Mozilla Thunderbird

Thanks Arnold. Regarding opening a ticket against POSIX - I wouldn't mind doing that (I have a few related others currently open) but unfortunately I don't have any other version of awk to test with and I don't know how other awks behave regarding \], \- or \^ inside a bracket expression. Do you know if all modern awks (e.g. not old awk and not mawk1) treat those as literal when escaped anywhere inside a bracket expression? Are there any other escape sequences you're aware of, inside or outside of bracket expressions, I should also cover in the ticket?

    Ed.

On 11/3/2024 11:34 PM, arnold@skeeve.com wrote:
Hi Ed.

I am adding something to the manual. IMHO it's a bug in the standard
that \] isn't mentioned; I suggest opening a ticket on it.

Thanks,

Arnold

Ed Morton via "Bug reports only for gawk."<bug-gawk@gnu.org> wrote:

I finally came across the POSIX reference that allows awk to interpret
```\``` in a bracket expression as an escape character - it's in
https://pubs.opengroup.org/onlinepubs/9799919799/utilities/awk.html#tag_20_06_13_04:

these escape sequences shall be recognized both inside and outside
bracket expressions.

*Escape Sequence*

        

*Description*

        

*Meaning*

\\

        

Two <backslash> characters.

        

In the lexical token *ERE*, the sequence shall represent itself. In
the lexical token *STRING*, it shall represent a single <backslash>.

\c

        

A <backslash> character followed by any character not described in
this table or in the table in XBD /5. File Format Notation/
<https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap05.html#tag_05> ('\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v').

        

Undefined

so inside or outside of a bracket expression `\\` has to mean `\` and
the meaning of `\c` where `c` is any "ordinary character" is undefined
by POSIX and so gawk can treat it however it likes, hence allowing
`[a\]]` to mean "a or ]", for example. So, I'd consider that second part
more "allowed by" rather than "mandated by" POSIX (POSIX doesn't mandate
what `[a\]]` means to gawk) but maybe that's just nitpicking. I'd like
to see the doc add that reference though as it took me hours wading
through POSIX awk and regexp specs to find it.

      Ed.

On 11/3/2024 7:50 AM, Ed Morton via Bug reports only for gawk. wrote:
Just a small tweak suggestion for the gawk documentation regarding
backslashes inside bracket expressions.

https://www.gnu.org/software/gawk/manual/html_node/Bracket-Expressions.html currently says (**emphasis mine**):

The treatment of ‘\’ in bracket expressions is compatible with other
awk implementations **and is also mandated by POSIX**.
but POSIX, at least this 2024 incarnation of the spec, seems pretty
clear (see references below*) that a backslash inside a bracket
expression is not an escape character so per POSIX these would be
compliant behavior:

$ printf 'a\\d\n' | grep -E '[\]'
a\d
$ printf 'a\\d\n' | sed -En '/[\]/p'
a\d
while these would not:

$ printf 'a\\d\n' | awk '/[\]/'
awk: cmd. line:1: /[\]/
awk: cmd. line:1:  ^ unterminated regexp
$ printf 'a\\d\n' | awk --posix '/[\]/'
awk: cmd. line:1: /[\]/
awk: cmd. line:1:  ^ unterminated regexp
so maybe either remove that "and is also mandated by POSIX" statement
or provide a reference to where that behavior IS mandated by POSIX to
clear up any confusion.

     Ed.

*From the current, 2024, POSIX regexp spec,
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html (**emphasis mine**):

[9.1 Regular Expression
Definitions](https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_01)

...
escape sequence

The escape character followed by any single character, which is
thereby "escaped". The escape character is a \<backslash\> that is
**neither in a bracket expression** nor itself escaped.
which tells us that a backslash within a bracket expression is not an
escape character, and this:

[9.3.5 RE Bracket
Expression](https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_03_05)
... When the bracket
expression appears within an ERE, the special characters ... and
'```\```' (...
and \<backslash\>, respectively) shall **lose their special meaning
within
the bracket expression**
which reiterates that a backslash within a bracket expression has no
special meaning, and there's nothing I can see in [the POSIX awk
spec](https://pubs.opengroup.org/onlinepubs/9799919799/utilities/awk.html)
to override the above definitions.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]