[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[coreutils] tr: case mapping anomaly

From: Pádraig Brady
Subject: [coreutils] tr: case mapping anomaly
Date: Fri, 24 Sep 2010 23:47:27 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20100227 Thunderbird/3.0.3

I was just looking at a bug reported to fedora there where this abort()s

 $ LC_ALL=en_US tr '[:upper:] ' '[:lower:]'

It stems from the fact that there are 56 upper and 59 lower chars in iso-8859-1.
But I also noticed an anomaly which would affect the fix, which is,
that [:upper:] and [:lower:] are extended in string 2
when there are still characters to match in string 1.
I.E. 0xDE (the last upper char) is output from:

 $ echo "_ _" | LC_ALL=en_US ./src/tr '[:lower:] ' '[:upper:]'

That seems quite inconsistent given that other classes
are not allowed in string 2 when translating:

 $ echo "ab ." | LANG=en_US tr '[:digit:]' '[:alpha:]'
 tr: when translating, the only character classes that may appear in
 string2 are `upper' and `lower'

For consistency I think it better to keep the classes
in string 2 just for case mapping, and do something like:

 $ tr '[:upper:] ' '[:lower:]'
 tr: when not truncating set1, a character class can't be
 the last entity in string2

Note BSD allows extending the above, but that's at least
consistent with any class being allowed in string2.
I.E. this is disallowed by coreutils but Ok on BSD:

 $ echo "1 2" | LC_ALL=en_US.iso-8859-1 tr ' ' '[:alpha:]'

Is it OK to change tr like this?
I can't see anything depending on that.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]