Re: BUG? RFE? printf lacking unicode support in multiple areas

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: BUG? RFE? printf lacking unicode support in multiple areas

From:	Linda Walsh
Subject:	Re: BUG? RFE? printf lacking unicode support in multiple areas
Date:	Fri, 20 May 2011 13:30:58 -0700
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.24) Thunderbird/2.0.0.24 Mnenhy/0.7.6.666



Andreas Schwab wrote:

Linda Walsh <bash@tlinx.org> writes:

2) It doesn't handle the "%lc" conversion to print out wide
characters.  To demonstrate this I created a wide char for a
double exclamation mark U+203C, using a=$'0x3c\0x20' and then


That's not a wide character, that's a four character string.

----
I don't know why I typed it in that way as it wasn't what I used in
my examples.   I often get distracted when typing in summaries and
don't type in my examples as created.   Will have to think about how
to compensate for my distractibility, but inherent in the process is
getting distracted away from using any compensation.  *sigh*

The 16-bit value I generated was done using:
   $'\x3c\x20'

That generates a 16-bit value:

echo -n $'\x3c\x20'|hexdump

0000000 203c0000002


(default for hexdump is the "-x" param, which displays 16-bit values in hex.

i.e. it's showing me a 16-bit value: 0x203c, which I thought would be the
wide-char value for the double-exclamation.  Going from the wchar definition
on NT, it is a 16-bit value.  Perhaps it is different under POSIX? but
0x203c taken as 32 bits with 2 high bytes of zeros would seem to specify
the same codepoint for the Dbl-EXcl.

Since there is no way to produce a word containing a NUL character it is
impossible to support %lc in any useful way.

----
        That's annoying.   How can one print out unicode characters

that are supposed to be 1 char long?

This isn't just a bash problem given how well most of the unix "character"
utils work with unicode -- that's something that really needs to be solved
if those character utils are going to continue to be _as useful_ in the future.
Sure they will have their current functionality which is of use in many ways, 
but
for anyone not processing ASCII text it becomes a problem, but this isn't really
a bash is.

        That said, it was my impression that a wchar was 16-bits (at least it
is on MS.  Is it different under POSIX?  @16bit, 0x203c would fit, and 
theoretically
could benefit if %lc worked.  I.e.:

b=$'\x3c\x20'
printf "%lc" "$b"

Though without some changes, it wouldn't work for chars with \00 in them,so would be of questionable use.


Oh well...

Again, thanks to the previous person who pointed out the \u & \U enhancements...

[Prev in Thread]

Current Thread

[Next in Thread]

BUG? RFE? printf lacking unicode support in multiple areas, Linda Walsh, 2011/05/20
- Re: BUG? RFE? printf lacking unicode support in multiple areas, Pierre Gaston, 2011/05/20
  - Re: BUG? RFE? printf lacking unicode support in multiple areas, Linda Walsh, 2011/05/20
- Re: BUG? RFE? printf lacking unicode support in multiple areas, Andreas Schwab, 2011/05/20
  - Re: BUG? RFE? printf lacking unicode support in multiple areas, Linda Walsh <=
    - Re: BUG? RFE? printf lacking unicode support in multiple areas, Eric Blake, 2011/05/20
- Re: BUG? RFE? printf lacking unicode support in multiple areas, Greg Wooledge, 2011/05/20
  - Re: BUG? RFE? printf lacking unicode support in multiple areas, Ralf Goertz, 2011/05/20
    - Re: BUG? RFE? printf lacking unicode support in multiple areas, Greg Wooledge, 2011/05/20
  - Re: BUG? RFE? printf lacking unicode support in multiple areas, Linda Walsh, 2011/05/20
    - Re: BUG? RFE? printf lacking unicode support in multiple areas, Greg Wooledge, 2011/05/23
- Re: BUG? RFE? printf lacking unicode support in multiple areas, Chet Ramey, 2011/05/20

Prev by Date: Re: BUG? RFE? printf lacking unicode support in multiple areas
Next by Date: Re: BUG? RFE? printf lacking unicode support in multiple areas
Previous by thread: Re: BUG? RFE? printf lacking unicode support in multiple areas
Next by thread: Re: BUG? RFE? printf lacking unicode support in multiple areas
Index(es):
- Date
- Thread