|
From: | GNU bug Tracking System |
Subject: | bug#56351: closed (LC_CTYPE=C.UTF-8 causes an matching error on Sed) |
Date: | Sat, 02 Jul 2022 22:58:01 +0000 |
Your message dated Sat, 2 Jul 2022 17:57:18 -0500 with message-id <a15320a0-1adb-eb49-c57f-064c14ab9131@cs.ucla.edu> and subject line LC_CTYPE=C.UTF-8 causes an matching error on Sed has caused the debbugs.gnu.org bug report #56351, regarding LC_CTYPE=C.UTF-8 causes an matching error on Sed to be marked as done. (If you believe you have received this mail in error, please contact help-debbugs@gnu.org.) -- 56351: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=56351 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems
--- Begin Message ---Subject: LC_CTYPE=C.UTF-8 causes an matching error on Sed Sed (and also Grep) cannot match a certain range of Korean characters when it operates under LC_CTYPE=C.UTF-8 (and whatever language environment with UTF-8 encoding including en_US.UTF-8, ko_KR.UTF-8, or ja_JP.UTF-8 etc.) Date: Sat, 02 Jul 2022 14:03:10 +0900 reproducing the bug on Sed: $ export LC_CTYPE=C.UTF-8 $ echo 폿 | sed -e 's/./a/' a <-- matched and replaced without an issue $ echo 퐀 | sed -e 's/./a/' 퐀 <-- FAILED to match so it doesn't replaceIn detail, a character that is in the range [가-폿] (<UAC00>~<UD3FF>) is matched without any issue but a character in the range [퐀-힣] (<UD400>~<UD7A3>) CANNOT be matched but it IS SUPPOSED TO be matched.Grep has the same issue with the period regex too. reproducing the bug on Grep: $ export LC_CTYPE=C.UTF-8 $ echo 폿 | grep . 폿 <-- matched successfully $ echo 퐀 | grep . $ <-- failed to matchI think it is related with <regex.h> or <iconv.h> on Glibc, but I couldn't find way to reproduce the bug with those, so alternatively, I report on Sed instead.I also report this issue on the bug-grep list too.
--- End Message ---
--- Begin Message ---Subject: LC_CTYPE=C.UTF-8 causes an matching error on Sed Date: Sat, 2 Jul 2022 17:57:18 -0500 Thanks for reporting that. This bug was introduced in Sed 4.8. I propagated the Gnulib fix into the Sed development tree, here: User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 https://git.savannah.gnu.org/cgit/sed.git/commit/?id=bfdc4d6ee4811c34d8756fcca7895f5d2eed6946 https://git.savannah.gnu.org/cgit/sed.git/commit/?id=49c90357b9a07fc78904660f68c2e6acd236da9d and the bug should be fixed in the next Sed release.
--- End Message ---
[Prev in Thread] | Current Thread | [Next in Thread] |