coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [coreutils] tr: case mapping anomaly


From: Eric Blake
Subject: Re: [coreutils] tr: case mapping anomaly
Date: Wed, 29 Sep 2010 05:59:03 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100921 Fedora/3.1.4-1.fc13 Mnenhy/0.8.3 Thunderbird/3.1.4

On 09/28/2010 06:23 PM, Pádraig Brady wrote:
I found a few more issues:

This valid translation spec aborted:
   LANG=en_US tr '[:upper:]- ' '[:lower:]_'
This misaligned conversion spec was allowed:
   LANG=C tr 'A-Y[:lower:]' 'a-z[:upper:]'
This misaligned spec was allowed by extending the class:
   LANG=C tr '[:upper:] ' '[:lower:]'

I'll apply the attached soon.

+      /* Note BSD allows extending of classes in string2.  For example:
+           tr '[:upper:]0-9' '[:lower:]'
+         That's not portable however, contradicts POSIX and is dependent
+         on your collating sequence.  */

That's not portable, however; it contradicts POSIX and is dependent on your collating sequence.

+
+# Ensure we support translation of case classes with extension
+echo '01234567899999999999999999' > exp
+echo 'abcdefghijklmnopqrstuvwxyz' |
+tr '[:lower:]' '0-9' > out || fail=1

I guess we're guaranteed that [:lower:] has a defined order in the C locale, so this one looks okay.

+tr '[:upper:][:lower:]' 'a-z[:upper:]' < /dev/null || fail=1
+tr '[:upper:][:lower:]' '[:upper:]a-z' < /dev/null || fail=1

Likewise, these two are not required by POSIX, but since they have a defined order in the C locale, this looks okay.

+
+# Before coreutils 8.6 the trailing space in string1
+# caused the case class in string2 to be extended.
+# However that was not portable, dependent on locale
+# and in contravention of POSIX.

However, that was not portable, dependent on locale, and contrary to POSIX.

+tr '[:upper:] ' '[:lower:]' < /dev/null 2>out && fail=1
+echo 'tr: when translating with string1 longer than string2,
+the latter string must not end with a character class' > exp
+compare out exp || fail=1
+
+# Before coreutils 8.6 the disparat number of upper and lower

disparate

+    # Ensure when there are a different number of elements
+    # in each string, we validate the case mapping correctly
+    tr 'ab[:lower:]' '0-1[:upper:]' < /dev/null || _fail=1

Nice test; 'ab' and '0-1' are the same size sets of characters, but different length strings, so [:lower:] and [:upper:] are still aligned. However, it's only done in the en_US locale; you should probably also test this POSIX-required feature under the C locale.

+
+    # Ensure we extend string2 appropriately
+    tr '[:upper:]- ' '[:lower:]_' < /dev/null || _fail=1

Seems non-portable to have a - in the middle, even though here the left side is a character class instead of a byte. I think you'd better pick a different character than -, or move the - to the end.

+
+    # Ensure the size of the case classes are accounted
+    # for as a unit.
+    echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' |
+    tr '[:upper:]A-B' '[:lower:]0' >out ||  _fail=1
+    echo '00cdefghijklmnopqrstuvwxyz' > exp

Huh? A and B are both in [:upper:]; when a character is listed more than once in string1, it is only transliterated according to the first listing. I think this should be 'abc...' not '00c...' for the expected results.

+    # Ensure the size of the case classes are accounted
+    # for as a unit.
+    echo 'a' |
+    tr -t '[:lower:]a' '[:upper:]0' >out ||  _fail=1
+    echo '0' > exp

Likewise, this should be 'A' not '0', since 'a' is part of [:lower:].

+    # Ensure the size of the case classes are accounted
+    # for as a unit.
+    echo 'a' |
+    tr -t '[:lower:][:lower:]a' '[:lower:][:upper:]0' >out ||  _fail=1
+    echo '0' > exp

Here, 'a' rather than '0' (the leading [:lower:] in both strings means that all lower-case letters are unchanged).

--
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]