[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Bug in GNU awk, RLENGTH fails in locale es_UY.UTF-8
From: |
Francisco Castro |
Subject: |
Bug in GNU awk, RLENGTH fails in locale es_UY.UTF-8 |
Date: |
Tue, 20 Jan 2009 19:52:00 -0200 |
User-agent: |
KMail/1.9.9 |
I found a bug, and I hope these examples help.
address@hidden:~% awk --version | sed q
GNU Awk 3.1.5
address@hidden:~% echo $LANG
es_UY.UTF-8
address@hidden:~% echo ae | awk '{print match($0, /e*/); print RSTART, RLENGTH}'
1
1 17
# It works with LANG=C
address@hidden:~% echo ae | LANG=C awk '{match($0, /e*/); print RSTART,
RLENGTH}'
1 0
# It also works when RLENGTH should return != 0:
address@hidden:~% echo ñ | awk '{match($0, /ñ*/); print RSTART, RLENGTH}'
1 1
# Some more examples:
address@hidden:~% echo ae | awk '{match($0, /e*/); print RSTART, RLENGTH}'
1 18
address@hidden:~% echo aee | awk '{match($0, /e*/); print RSTART, RLENGTH}'
1 18
address@hidden:~% echo aeee | awk '{match($0, /e*/); print RSTART, RLENGTH}'
1 26
address@hidden:~% echo hello | awk '{match($0, /e*/); print RSTART, RLENGTH}'
1 26
address@hidden:~% echo world. | awk '{match($0, /e*/); print RSTART, RLENGTH}'
1 34
# The ñ is the character represented with the two bytes: 0xC3 0xB1 in UTF-8.
# "ñññ" and "world." gives the same result, it means it has something to do
# with the size in bytes, and not the length.
address@hidden:~% echo ñññ | awk '{match($0, /e*/); print RSTART, RLENGTH}'
1 34
address@hidden:~% echo -n ñ | od -t x1
0000000 c3 b1
0000002
address@hidden:~% echo aaaaaaaaaaaaaaaaaaa | awk '{match($0, /e*/); print
RSTART, RLENGTH}'
1 82
address@hidden:~% cat /etc/issue
Debian GNU/Linux lenny/sid \n \l
address@hidden:~% md5sum `which awk`
423835ba1e46c652823021da7d41c4e1 /usr/bin/awk
address@hidden:~# LANG=C apt-get install gawk | grep gawk
gawk is already the newest version.
--
Francisco Castro
signature.asc
Description: This is a digitally signed message part.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Bug in GNU awk, RLENGTH fails in locale es_UY.UTF-8,
Francisco Castro <=