bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: BUG? RFE? printf lacking unicode support in multiple areas


From: Linda Walsh
Subject: Re: BUG? RFE? printf lacking unicode support in multiple areas
Date: Fri, 20 May 2011 13:30:58 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.24) Thunderbird/2.0.0.24 Mnenhy/0.7.6.666



Andreas Schwab wrote:
Linda Walsh <bash@tlinx.org> writes:

2) It doesn't handle the "%lc" conversion to print out wide
characters.  To demonstrate this I created a wide char for a
double exclamation mark U+203C, using a=$'0x3c\0x20' and then

That's not a wide character, that's a four character string.
----
I don't know why I typed it in that way as it wasn't what I used in
my examples.   I often get distracted when typing in summaries and
don't type in my examples as created.   Will have to think about how
to compensate for my distractibility, but inherent in the process is
getting distracted away from using any compensation.  *sigh*

The 16-bit value I generated was done using:
   $'\x3c\x20'

That generates a 16-bit value:
echo -n $'\x3c\x20'|hexdump
0000000 203c 0000002

(default for hexdump is the "-x" param, which displays 16-bit values in hex.

i.e. it's showing me a 16-bit value: 0x203c, which I thought would be the
wide-char value for the double-exclamation.  Going from the wchar definition
on NT, it is a 16-bit value.  Perhaps it is different under POSIX? but
0x203c taken as 32 bits with 2 high bytes of zeros would seem to specify
the same codepoint for the Dbl-EXcl.

Since there is no way to produce a word containing a NUL character it is
impossible to support %lc in any useful way.
----
        That's annoying.   How can one print out unicode characters
that are supposed to be 1 char long?
This isn't just a bash problem given how well most of the unix "character"
utils work with unicode -- that's something that really needs to be solved
if those character utils are going to continue to be _as useful_ in the future.
Sure they will have their current functionality which is of use in many ways, 
but
for anyone not processing ASCII text it becomes a problem, but this isn't really
a bash is.

        That said, it was my impression that a wchar was 16-bits (at least it
is on MS.  Is it different under POSIX?  @16bit, 0x203c would fit, and 
theoretically
could benefit if %lc worked.  I.e.:

b=$'\x3c\x20'
printf "%lc" "$b"

Though without some changes, it wouldn't work for chars with \00 in them, so would be of questionable use.

Oh well...

Again, thanks to the previous person who pointed out the \u & \U enhancements...







reply via email to

[Prev in Thread] Current Thread [Next in Thread]