[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode string literals
From: |
Bruno Haible |
Subject: |
Re: Unicode string literals |
Date: |
Thu, 30 Apr 2020 12:06:10 +0200 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-177-generic; KDE/5.18.0; x86_64; ; ) |
Hi Marc,
Marc Nieper-Wißkirchen wrote:
> On a system that supports at least C11, I can create an UTF8-encoded
> literal string through:
>
> (uint8_t const *) u8"..."
>
> Could Gnulib abstract this into a macro so that substitutes for
> systems that do not have u8 string literals can be provided.
>
> On a C11 system, we would have
>
> #define UTF8(s) ((uint8_t const *) u8 ## s)
>
> and similar definitions for UTF16 and UTF32.
Unfortunately, we cannot provide such macros. The reason is that the
translation from the source file's encoding to UTF-8/UTF-16/UTF-32 must
be done by the compiler, if you want to be able to write
static uint8_t my_string[] = u8"Wißkirchen";
Your best bet is
1) For UTF-8 encoded strings, ensure that your source code is UTF-8
encoded, or use escapes, like in gnulib/tests/uniwidth/test-u8-width.c.
2) For UTF-16 encoded strings, which you'll need only on Windows,
write L"Wißkirchen". Or use hex codes, like in
gnulib/tests/uniwidth/test-u16-width.c.
3) Don't use UTF-32 encoded strings. Or use hex codes, like in
gnulib/tests/uniwidth/test-u32-width.c.
> Similarly, something like
>
> #define ASCII(s) (u8 ## s [0])
>
> for pre-C2x systems would be nice so that ASCII("c") expands into the
> ASCII code point of the character `c'.
What's the point of this one? Why not just write 'c'?
Bruno