Re: Unicode string literals

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode string literals

From:	Marc Nieper-Wißkirchen
Subject:	Re: Unicode string literals
Date:	Thu, 30 Apr 2020 12:23:27 +0200

Hi Bruno,

thank you very much for your reply.

Am Do., 30. Apr. 2020 um 12:06 Uhr schrieb Bruno Haible <address@hidden>:

[...]

> Unfortunately, we cannot provide such macros. The reason is that the
> translation from the source file's encoding to UTF-8/UTF-16/UTF-32 must
> be done by the compiler, if you want to be able to write
>   static uint8_t my_string[] = u8"Wißkirchen";

For a compiler that supports the "u8" prefix, which is defined by C11,
the compiler should do the translation from the source file encoding
to UTF-8.  I was hoping that compilers not supporting enough of C11
would have some other way to translate from the source file encoding
to UTF-8, which could be exploited by Gnulib.

> Your best bet is
>   1) For UTF-8 encoded strings, ensure that your source code is UTF-8
>      encoded, or use escapes, like in gnulib/tests/uniwidth/test-u8-width.c.

Using escapes for non-ASCII characters, it will work whenever the
execution character set of the compiler is compatible with ASCII,
right?

>   2) For UTF-16 encoded strings, which you'll need only on Windows,
>      write L"Wißkirchen". Or use hex codes, like in
>      gnulib/tests/uniwidth/test-u16-width.c.
>   3) Don't use UTF-32 encoded strings. Or use hex codes, like in
>      gnulib/tests/uniwidth/test-u32-width.c.

These two are less important for me; I mentioned them to have a full
set of macros.

>
> > Similarly, something like
> >
> > #define ASCII(s) (u8 ## s [0])
> >
> > for pre-C2x systems would be nice so that ASCII("c") expands into the
> > ASCII code point of the character `c'.
>
> What's the point of this one? Why not just write 'c'?

I was thinking of a system whose execution character set is not
compatible with ASCII. Or are those excluded in general by Gnulib?

Thanks again,

Marc

[Prev in Thread]

Current Thread

[Next in Thread]

Unicode string literals, Marc Nieper-Wißkirchen, 2020/04/30
- Re: Unicode string literals, Bruno Haible, 2020/04/30
  - Re: Unicode string literals, Marc Nieper-Wißkirchen <=
    - Re: Unicode string literals, Bruno Haible, 2020/04/30
    - Re: Unicode string literals, Paul Eggert, 2020/04/30
    - Re: Unicode string literals, Marc Nieper-Wißkirchen, 2020/04/30
    - Re: Unicode string literals, Paul Eggert, 2020/04/30

Prev by Date: Re: Unicode string literals
Next by Date: Re: Unicode string literals
Previous by thread: Re: Unicode string literals
Next by thread: Re: Unicode string literals
Index(es):
- Date
- Thread