--- Begin Message ---
Subject: |
Re: bug#33793: sed bug with regular expressions |
Date: |
Tue, 18 Dec 2018 12:23:16 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 |
tag 33793 notabug
thanks
On 12/18/18 6:50 AM, Uladzimir Panasiuk wrote:
Hi. I've found the bug using sed. There is how to reproduce:
1) Run bash
2) Exec command \
echo weather -5.0 | sed
's/[^0-9\-\.]//g'
You used two range expressions in this regex, but the result is the same
as if you had used this regex with only one range expression::
's/[^0-9\.]//g'
Either way, you requested all characters except for the 10 digits, a
literal backslash, or a literal dot. Remember, a range expression [\-\]
selects a single character of the backslash. Since '-' is not excluded
from the [] expression, sed correctly strips it.
3) You will get "5.0". Expected output is "-5.0"
You might be remembering the behavior of perl regex, where \ inside []
is an escape character. But that's not how POSIX regex behaves - inside
[], \ is literal, and there are no escape characters.
BUT
If you exec
echo weather -5.0 | sed 's/[^0-9\.\-]//g'
Here, your regex only has one range expression, but lists \ twice. The
repetition is harmless, but means that your expression is the same as
this shorter:
's/[^0-9\.-]//g'
It is not obvious from your input whether you intended to be filtering
out literal backslash or not, but if not, you probably meant to write:
's/[^0-9.-]//g'
with no backslash, and with the - last (as that is one of the few places
that you can write - to be matched as itself rather than treated as a
range operator between neighboring characters).
I'm closing this as not a bug, but feel free to reply with further
questions or comments.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
--- End Message ---