bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #28275] Ranges like [a-z] incorrectly match in UTF systems


From: Norihirio Tanaka
Subject: [bug #28275] Ranges like [a-z] incorrectly match in UTF systems
Date: Thu, 17 Dec 2009 01:30:22 +0000
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6

Follow-up Comment #6, bug #28275 (project grep):

The testcase gave me the following results.

% dd if=/dev/urandom bs=1024 count=1024 
| iconv -c -f ucs-2 -t utf-8 
| LANG=en_US.UTF8 grep -oha '[a-z]' 
| hexdump -C 
| sed -e 's/^[^ ]*//; s/|.*//; s/ 0a/
/g' 
| sed -e 's/^ *//; s/  */ /g; /^$/d'

c5 a3
c5 b7
c5 ad
77
c4 81
c4 89
c5 9b
68
c2 aa
c5 b5
c4 a7
c3
a8
c5 a9
c3 a6
c4 b8
c3 ae
78
c4 ab
c3 a4
c3 a3
c5 9b
c3 bd

Though I don't know what character "c5 a3", "c5 b7", "c5 ad",
etc show, this behavior is designed in glibc.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?28275>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]