[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24975: Matching issues with characters whose encoding ends in some o
From: |
Stephane Chazelas |
Subject: |
bug#24975: Matching issues with characters whose encoding ends in some other character |
Date: |
Sun, 20 Nov 2016 21:50:28 +0000 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
$ locale charmap
GB18030
$ printf '\uC9\n' | grep '.*7' | hd
00000000 81 30 87 37 0a |.0.7.|
00000005
U+00C9's encoding does end in the 0x37 byte (7 in ASCII and GB18030).
$ printf '\uC9\n' | grep '.*0'
fails.
$ printf '\uC9\n' | grep -o '.*7'
returns with a zero exit status but outputs nothing. It's as if
.*7 matched an empty string somewhere.
printf '\uC9\n' | grep '\(.*7\)\1'
fails.
so do:
grep 7
grep '7$'
grep '.7'
grep '[^x]*7'
printf 'x\uC9\n' | grep -E '.+7'
These match:
grep '.\{0,1\}7'
grep -E '.?7'
printf '\uC9x\n' | grep '.*7x' # still outputs nothing with -o
That's not confined to GB18030. You get similar issues with
BIG5-HKSCS, BIG5 or GBK.
$ locale charmap
BIG5-HKSCS
$ printf '\ue9\n' | grep '.*m' | hd
00000000 88 6d 0a |.m.|
00000003
Reproduced with 2.25, 2.26 and the current git head on ubuntu 16.04 amd64.
--
Stephane
- bug#24975: Matching issues with characters whose encoding ends in some other character,
Stephane Chazelas <=