bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #18633] regular expressions case-insensitivity


From: Rob Hinks
Subject: [bug #18633] regular expressions case-insensitivity
Date: Sat, 30 Dec 2006 15:25:44 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9

Follow-up Comment #2, bug #18633 (project grep):

Yes, it does seem to be a locale problem. If I type grep '^[a-E]' filename
then aAbBcCdDeE are included. When using LC_ALL=C grep '^[a-e]' grep then
only abcde are included. My friend who has a different locale has no problems
unless he types LC_ALL=en_GB.UTF-8 grep '^[a-e]' filename

My locale is:
address@hidden:~/UnixProgramming$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

output of type -a grep:
address@hidden:~/UnixProgramming$ type -a grep
grep is /usr/local/bin/grep
grep is /bin/grep


I've also tested LC_ALL-en_US.UTF-8 and this suffers from the same problem.
My friend has done a bit of investigation which might be of interest:

"Link to the page of the manual that i'm kindof talking about.
http://www.gnu.org/software/grep/doc/grep_8.html#IDX178

"It matches any single character that sorts between the two
characters, inclusive, using the locale's collating sequence and
character set"

So, I think this is implying that the character set itself has no
ordering, as it probably shouldn't, although you could infer one based upon
the actually encoding underneath (however this is fairly nasty, as it's
indirectly exposing the underlying storage structures, which shouldn't have
to be sensibly ordered).

Which if you've made it through that sentence - we're onto the
locale's collating sequence. This would be the bit that says how to sort the
letters. I'm guessing grep calls a function called get range, taking 2
arguments, start and end.. so, for a-c, it would call getRange(a,c), which
would return abc, (or aAbBc) and then use that value in the [ any one from
here brackets ]."

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?18633>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]