|
From: | Hermann Peifer |
Subject: | Re: /usr/bin/printf: invalid universal character name |
Date: | Sun, 11 May 2008 17:08:16 +0200 |
User-agent: | Thunderbird 2.0.0.12 (X11/20080227) |
Jim wrote:
Thanks for your swift reply. (BTW: are mails to address@hidden not copied to gnu.utils.bug?)Hermann Peifer <address@hidden> wrote:printf \uHHHH is expected to print Unicode chars. This work fine in most cases, but some legal code points are reported as errors: values in the ASCII range and C1 control chars, and values between U+D800..U+DFFF I would say that this behaviour is rather a bug than a feature.Thanks for the report, but this is not some arbitrary restriction, but rather conformance to the standard (C99, ISO/IEC 10646) for "universal character name" syntax: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n717.htm Here's part of printf.c, with a comment that probably came from a version of N717: /* A universal character name shall not specify a character short identifier in the range 00000000 through 00000020, 0000007F through 0000009F, or 0000D800 through 0000DFFF inclusive. A universal character name shall not designate a character in the required character set. */ if ((uni_value <= 0x9f && uni_value != 0x24 && uni_value != 0x40 && uni_value != 0x60) || (uni_value >= 0xd800 && uni_value <= 0xdfff)) error (EXIT_FAILURE, 0, _("invalid universal character name \\%c%0*x"), esc_char, (esc_char == 'u' ? 4 : 8), uni_value);/usr/bin/printf: invalid universal character name \u0000 /usr/bin/printf: invalid universal character name \u0001... I can understand that you'd find the restriction surprising, but I wouldn't call it a bug.
I do acknowledge that C0 and C1 control chars are some sort of a border case. It is true that the Unicode standard does not assign *normative names* for them but rather adds the placeholder "<control>" as a dummy name (btw, this was different in earlier versions of Unicode). However, all C0 and C1 *code points* are at least included in:
http://www.unicode.org/charts/PDF/U0000.pdf http://www.unicode.org/charts/PDF/U0080.pdf http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txtAnd I didn't expect /usr/bin/printf to worry about normative or non-normative names of Unicode chars, but rather print the chars themselves.
If we let the control chars question aside, it is still hard to believe that it is not a bug that almost all ASCII chars 0020..007e lead to EXIT_FAILURE. This rule is more than peculiar, to say the least and it is also inconsistent with its own comment:
if ((uni_value <= 0x9f && uni_value != 0x24 && uni_value != 0x40 && uni_value != 0x60)Only DOLLAR SIGN, COMMERCIAL AT and GRAVE ACCENT are legal in the range 0x00..0x9f ?
I still think that these 92 cases are bugs, rather than anything else: /usr/bin/printf: invalid universal character name \u0020 /usr/bin/printf: invalid universal character name \u0021 /usr/bin/printf: invalid universal character name \u0022 /usr/bin/printf: invalid universal character name \u0023 /usr/bin/printf: invalid universal character name \u0025 /usr/bin/printf: invalid universal character name \u0026 /usr/bin/printf: invalid universal character name \u0027 /usr/bin/printf: invalid universal character name \u0028 /usr/bin/printf: invalid universal character name \u0029 /usr/bin/printf: invalid universal character name \u002a /usr/bin/printf: invalid universal character name \u002b /usr/bin/printf: invalid universal character name \u002c /usr/bin/printf: invalid universal character name \u002d /usr/bin/printf: invalid universal character name \u002e /usr/bin/printf: invalid universal character name \u002f /usr/bin/printf: invalid universal character name \u0030 /usr/bin/printf: invalid universal character name \u0031 /usr/bin/printf: invalid universal character name \u0032 /usr/bin/printf: invalid universal character name \u0033 /usr/bin/printf: invalid universal character name \u0034 /usr/bin/printf: invalid universal character name \u0035 /usr/bin/printf: invalid universal character name \u0036 /usr/bin/printf: invalid universal character name \u0037 /usr/bin/printf: invalid universal character name \u0038 /usr/bin/printf: invalid universal character name \u0039 /usr/bin/printf: invalid universal character name \u003a /usr/bin/printf: invalid universal character name \u003b /usr/bin/printf: invalid universal character name \u003c /usr/bin/printf: invalid universal character name \u003d /usr/bin/printf: invalid universal character name \u003e /usr/bin/printf: invalid universal character name \u003f /usr/bin/printf: invalid universal character name \u0041 /usr/bin/printf: invalid universal character name \u0042 /usr/bin/printf: invalid universal character name \u0043 /usr/bin/printf: invalid universal character name \u0044 /usr/bin/printf: invalid universal character name \u0045 /usr/bin/printf: invalid universal character name \u0046 /usr/bin/printf: invalid universal character name \u0047 /usr/bin/printf: invalid universal character name \u0048 /usr/bin/printf: invalid universal character name \u0049 /usr/bin/printf: invalid universal character name \u004a /usr/bin/printf: invalid universal character name \u004b /usr/bin/printf: invalid universal character name \u004c /usr/bin/printf: invalid universal character name \u004d /usr/bin/printf: invalid universal character name \u004e /usr/bin/printf: invalid universal character name \u004f /usr/bin/printf: invalid universal character name \u0050 /usr/bin/printf: invalid universal character name \u0051 /usr/bin/printf: invalid universal character name \u0052 /usr/bin/printf: invalid universal character name \u0053 /usr/bin/printf: invalid universal character name \u0054 /usr/bin/printf: invalid universal character name \u0055 /usr/bin/printf: invalid universal character name \u0056 /usr/bin/printf: invalid universal character name \u0057 /usr/bin/printf: invalid universal character name \u0058 /usr/bin/printf: invalid universal character name \u0059 /usr/bin/printf: invalid universal character name \u005a /usr/bin/printf: invalid universal character name \u005b /usr/bin/printf: invalid universal character name \u005c /usr/bin/printf: invalid universal character name \u005d /usr/bin/printf: invalid universal character name \u005e /usr/bin/printf: invalid universal character name \u005f /usr/bin/printf: invalid universal character name \u0061 /usr/bin/printf: invalid universal character name \u0062 /usr/bin/printf: invalid universal character name \u0063 /usr/bin/printf: invalid universal character name \u0064 /usr/bin/printf: invalid universal character name \u0065 /usr/bin/printf: invalid universal character name \u0066 /usr/bin/printf: invalid universal character name \u0067 /usr/bin/printf: invalid universal character name \u0068 /usr/bin/printf: invalid universal character name \u0069 /usr/bin/printf: invalid universal character name \u006a /usr/bin/printf: invalid universal character name \u006b /usr/bin/printf: invalid universal character name \u006c /usr/bin/printf: invalid universal character name \u006d /usr/bin/printf: invalid universal character name \u006e /usr/bin/printf: invalid universal character name \u006f /usr/bin/printf: invalid universal character name \u0070 /usr/bin/printf: invalid universal character name \u0071 /usr/bin/printf: invalid universal character name \u0072 /usr/bin/printf: invalid universal character name \u0073 /usr/bin/printf: invalid universal character name \u0074 /usr/bin/printf: invalid universal character name \u0075 /usr/bin/printf: invalid universal character name \u0076 /usr/bin/printf: invalid universal character name \u0077 /usr/bin/printf: invalid universal character name \u0078 /usr/bin/printf: invalid universal character name \u0079 /usr/bin/printf: invalid universal character name \u007a /usr/bin/printf: invalid universal character name \u007b /usr/bin/printf: invalid universal character name \u007c /usr/bin/printf: invalid universal character name \u007d /usr/bin/printf: invalid universal character name \u007e Regards, Hermann
[Prev in Thread] | Current Thread | [Next in Thread] |