[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: BUG? RFE? printf lacking unicode support in multiple areas

From: Greg Wooledge
Subject: Re: BUG? RFE? printf lacking unicode support in multiple areas
Date: Mon, 23 May 2011 08:31:00 -0400
User-agent: Mutt/

On Fri, May 20, 2011 at 03:03:25PM -0700, Linda Walsh wrote:
> %lc uses wide chars 'wchar_t or wint_t'.   These are 16 bits on 
> Win&cygwin and 32 on with glib.

There is no such thing as a wchar_t or a wint_t in Bash.  There is no
such thing as a 16-bit or 32-bit integer type, neither big-endian nor
little-endian, in Bash.

In Bash, everything is a string.

In particular, you cannot just mash two bytes together and then expect
them to act like an integer type.  When you mash two bytes together,
you get a two-character string (except when those two bytes form a valid
multi-byte character in whichever locale you are using, in which case
you get a one-character string).

> wchar_t is also defined as 'utf16' (as a type in the include header files
> on linux).   That means from the page you so graciously point to:

There is no utf16 in Bash, unless your system defines a locale using
a UTF-16 encoding, and you happen to be using that locale.

> one would use the UTF-16 value...which is..um...gee, lets see
> 0x203c.  Gosh, what'ya know!

Sounds like you're attempting to make Bash act like C.  It won't.

Why would one "use the UTF-16 value" if one is not on a UTF-16 locale?
Are there actually any implementations of UTF-16 outside of Microsoft
Windows?  I don't know of any.

> Gee, I dunno maybe because it wasn't in my bash

Then why did you report all of this as a *Bash bug*?  If you honestly
thought it was supposed to work, and it didn't, you should have included
enough information for someone to see what it is you were attempting,
what it produced, and how you thought the output was wrong.

> and when I did a man of
> printf,

Which printf manual page?  printf(1p)?  printf(1)?  printf(3)?

I'm guessing you read printf(3) which is the libc function.  Bash's
builtin printf command is a bit different, because Bash does not
support all of the data types that C supports.  The POSIX standard
for the printf command is also different from the Bash builtin, because
the Bash builtin provides many extensions.

If you got printf(1) that means you got some random vendor's manual page
describing a printf command -- which could be a POSIX implementation,
or which could be a link to bash-builtins, or who knows what....

If you got printf(1p) that typically means it's a manual page describing
the POSIX standard feature set for the printf command, which is not
necessarily what your own printf command implements (either Bash's
builtin, or /usr/bin/printf, or whatever).

> >What does \u have to do with %lc?
> ---
> Not much -- except that a a wide char of 0x203c output using %lc
> should output the same multi-byte char as \u203c.

Sounds like something that might work in C, but not in Bash.

There is no way to represent or store "the 16-bit integer 0x203c" in Bash.
Nor would there be any way for Bash to take an arbitrary 16-bit integer
and translate it into a character as defined by some arbitary locale
that is not even in use.

The closest approximation you will get would be to write a string
consisting of \u followed by the 4 hex digits of the 16-bit integer
(in big-endian format) and pass that as the format specifier (first
argument) to Bash 4.2's printf command.

  printf -v tmpvar '\\u%04x' "$integer"
  printf "$tmpvar"

If you're in an older verion of Bash, then there is absolutely nothing
you can do (using only Bash builtins) to translate your 16-bit integer
into a Unicode character.  You'd have to look for some external utility
which can do it.

Note that the 16-bit integer here defines the Unicode code point, *not*
some UTF-* encoding.  If you're trying to define a character in your
own locale's encoding, you can just mash the bytes together.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]