bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#37634: Non-charset characters are not recognized.


From: sur3
Subject: bug#37634: Non-charset characters are not recognized.
Date: Sat, 5 Oct 2019 16:26:28 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

sed doesn't recognize non-charset characters as . (dot), eg with
LC_CTYPE=en_US.UTF-8:

# printf "ABCD\n" | sed 's/B.*C//'
Output: AD
# printf "AB\x8eCD\n" | sed 's/B.*C//'
Output: AB�CD

I also tried something like [^E]* instead of .* but that also does not work.
I think sed should recognize \x8e is not a C or newline even though it's
not in the character set.
With

# printf "AB\x8eCD\n" | LC_CTYPE=C sed 's/B.*C//'
Output: AD

it works but, that's a bit non-intuitive, because normally one wants to
have UTF8-charset and sed to function correctly anyway or is there an
other regex similar to . that can recognize such characters?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]