bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23635: possible bug in \c escape handling


From: Jim Meyering
Subject: bug#23635: possible bug in \c escape handling
Date: Sat, 28 May 2016 15:06:25 -0700

On Fri, May 27, 2016 at 6:08 PM, Assaf Gordon <address@hidden> wrote:
> Hello,
>
> There might be a small bug in processing of GNU extension escape sequence
> "\c".
>
> When the character following "\c" is a backslash, the code consumes only one
> character, leading to inconsistent and incorrect output.
> Example:
>
>   $ echo a | sed 's/./\c\\/' | od -c
>   0000000 034 \ \n
>   0000003
>   $ echo a | sed 's/./\c\d/' | od -c
>   0000000 034 d \n
>   0000003
>
> but:
>
>   $ echo a | sed 's/./\c\/' | od -c
>   sed: -e expression #1, char 8: unterminated `s' command
>   0000000
>
> Meaning there is no way to generate the character '\x034' alone with "\c".
>
> This is also somewhat inconsistent because it consumes a single backslash
> character
> (whereas everywhere else a single backslash is the escape character itself).
>
> For comparison, other characters behave as expected:
>
>   $ sed 's/./\cA/' in | od -c
>   0000000 001 \n
>   0000002
>   $ sed 's/./\c[/' in | od -c
>   0000000 033 \n
>   0000002
>   $ sed 's/./\c]/' in | od -c
>   0000000 035 \n
>   0000002
>
> As a side effect, it could also be confusing if the syntax allows
> 'recursive' escapes,
> such as "\c\x41", which might be argued to be '\c' of the following
> character,
> which should be first evaluated as \x61, resulting in "\cA".
>
> The attached patch fixes the problem with the following rules:
> 1. '\c\\' = Control-Backslash = ASCII 0x34.
> 2. Any other backslash combinations after "\c" are rejected, and sed aborts.
>
> Tests included. comments are welcomed.

Nice catch. I like the patch.
So far, I can make only two suggestions:
  - add a NEWS entry, since this is a bug fix
  - I have a slight preference for the one-liner printf '%s\n' a a a a
a a a ---- rather than your 7-line here-document to generate that same
output in the test case.

And a comment wording nit:

+# Before sed-4.3, this resulted in '\034d' .
+# now it should be rejected.

I prefer to say e.g.,

# Before sed-4.3, this resulted in '\034d'. Now, it is rejected.

Thank you!





reply via email to

[Prev in Thread] Current Thread [Next in Thread]