[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode string literals
From: |
Marc Nieper-Wißkirchen |
Subject: |
Re: Unicode string literals |
Date: |
Thu, 30 Apr 2020 12:23:27 +0200 |
Hi Bruno,
thank you very much for your reply.
Am Do., 30. Apr. 2020 um 12:06 Uhr schrieb Bruno Haible <address@hidden>:
[...]
> Unfortunately, we cannot provide such macros. The reason is that the
> translation from the source file's encoding to UTF-8/UTF-16/UTF-32 must
> be done by the compiler, if you want to be able to write
> static uint8_t my_string[] = u8"Wißkirchen";
For a compiler that supports the "u8" prefix, which is defined by C11,
the compiler should do the translation from the source file encoding
to UTF-8. I was hoping that compilers not supporting enough of C11
would have some other way to translate from the source file encoding
to UTF-8, which could be exploited by Gnulib.
> Your best bet is
> 1) For UTF-8 encoded strings, ensure that your source code is UTF-8
> encoded, or use escapes, like in gnulib/tests/uniwidth/test-u8-width.c.
Using escapes for non-ASCII characters, it will work whenever the
execution character set of the compiler is compatible with ASCII,
right?
> 2) For UTF-16 encoded strings, which you'll need only on Windows,
> write L"Wißkirchen". Or use hex codes, like in
> gnulib/tests/uniwidth/test-u16-width.c.
> 3) Don't use UTF-32 encoded strings. Or use hex codes, like in
> gnulib/tests/uniwidth/test-u32-width.c.
These two are less important for me; I mentioned them to have a full
set of macros.
>
> > Similarly, something like
> >
> > #define ASCII(s) (u8 ## s [0])
> >
> > for pre-C2x systems would be nice so that ASCII("c") expands into the
> > ASCII code point of the character `c'.
>
> What's the point of this one? Why not just write 'c'?
I was thinking of a system whose execution character set is not
compatible with ASCII. Or are those excluded in general by Gnulib?
Thanks again,
Marc