[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Whats the reason to suppress short unicode characters in printf?
From: |
Ingo Krabbe |
Subject: |
Whats the reason to suppress short unicode characters in printf? |
Date: |
Fri, 25 Mar 2016 10:20:26 +0100 |
Hey gnu wizards,
today I subscribed to this mailing list for a more or less philosphical
question, that is already the subject of this mail: Whats the reason to
suppress short unicode characters in printf?
But lets start with a bit of historical background how I stumbled over the
following issue.
As many linux users I'm a terminal junkie, or better I was a terminal junkie,
until I discovered even better ways to edit and fire command lines, but that
would be part of another big story.
So to help me to locate unicode characters, I wrote a little script, that does
it's best to print out any unicode character on a table
#!/bin/sh
if [ $# -lt 1 ]
then echo too few arguments. I need the table in hex form xxx, where
the lower byte will be replaced.
exit 1
fi
eval let t=0x$1
echo TABLE $t
let c=$(tput cols)
let c=$c\/16
let ci=0
for i in {0..255}
do
C=$(/usr/bin/printf "\\\\U000%03x%02x" $t $i)
form="%03x%02x: $C\t"
/usr/bin/printf "$form" $t $i
let ++ci
if [ $ci -ge $c ]
then /usr/bin/printf "\n"
let ci=0
fi
done
. This script might not be the best approach, but still is quite usefull. For
example
# unicode-table-yyy 001
gives
TABLE 1
00100: Ā 00101: ā 00102: Ă 00103: ă 00104:
Ą
[…]
001fa: Ǻ 001fb: ǻ 001fc: Ǽ 001fd: ǽ 001fe:
Ǿ
001ff: ǿ
that I also use to copy&paste in tmux terminals if I need a character in a
selected range.
Of course it helps to have a full unicode font installed.
But when I want to show the table #0 I get some errors, such as
[…]
0003c: /usr/bin/printf: invalid universal character name \U0000003c
0003d: /usr/bin/printf: invalid universal character name \U0000003d
0003e: /usr/bin/printf: invalid universal character name \U0000003e
0003f: /usr/bin/printf: invalid universal character name \U0000003f
00040: @
[…]
, where `man -s 1 printf` tells us to use
\UHHHHHHHH
Unicode character with hex value HHHHHHHH (8 digits)
that is implemented in a very complex way to support some seldom used terminals
with non utf-8 encodings (I hope non utf-8 encodings are seldom these days.)
263 else if (*p == 'u' || *p == 'U')
264 {
265 char esc_char = *p;
266 unsigned int uni_value;
267
268 uni_value = 0;
269 for (esc_length = (esc_char == 'u' ? 4 : 8), ++p;
270 esc_length > 0;
271 --esc_length, ++p)
272 {
273 if (! isxdigit (to_uchar (*p)))
274 error (EXIT_FAILURE, 0, _("missing hexadecimal
number in escape"));
275 uni_value = uni_value * 16 + hextobin (*p);
276 }
277
278 /* A universal character name shall not specify a
character short
279 identifier in the range 00000000 through 00000020,
0000007F through
280 0000009F, or 0000D800 through 0000DFFF inclusive. A
universal
281 character name shall not designate a character in the
required
282 character set. */
283 if ((uni_value <= 0x9f
284 && uni_value != 0x24 && uni_value != 0x40 &&
uni_value != 0x60)
285 || (uni_value >= 0xd800 && uni_value <= 0xdfff))
286 error (EXIT_FAILURE, 0, _("invalid universal character
name \\%c%0*x"),
287 esc_char, (esc_char == 'u' ? 4 : 8), uni_value);
288
289 print_unicode_char (stdout, uni_value, 0);
290 }
-- from coreutils-8.23 src/printf.c
Again another story would be the implementation of `print_unicode_char`, but my
case is the if clause [283,285] that suppresses some unicode values tables 000
and tables [0d8,0df].
Whats the reason for this exception?
The reason against this exception is clearly, when you fail for some values of
a set C, anyone who uses your program with input from set C has to implement
these exceptions too. So such spikes in the input set, carry through the whole
IO chain, that anyone who uses your `printf` program, has to implement this
exceptions to implement an error free algorithm on input characters from set C,
making the complex world of programming computer even more complex, as it
intrinsically is anyway.
Looking forward for an intresting discussion about complexity and with kind
regards,
Ingo Krabbe
--
Liberty for the Modules!
--
https://medium.com/@azerbike/i-ve-just-liberated-my-modules-9045c06be67c
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Whats the reason to suppress short unicode characters in printf?,
Ingo Krabbe <=