[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode string literals
From: |
Bruno Haible |
Subject: |
Re: Unicode string literals |
Date: |
Fri, 01 May 2020 11:01:48 +0200 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-177-generic; KDE/5.18.0; x86_64; ; ) |
Hi Paul,
> >> Could we have a macro to be used only in source code encoded via UTF-8?
> >> Presumably the older compilers would process the bytes of the string as if
> >> they
> >> were individual 8-bit characters and would pass them through unchanged, so
> >> the
> >> run-time string would be UTF-8 too.
>
> > This would allow writing a macro that prefixes "u8" to strings in
> > compilers supporting enough of C11, skipping the prefix in compilers
> > that pass UTF-8 encoded bytes in strings unchanged
>
> Yes, that was the idea.
Did you mean (1) that the programmer shall define a macro, that indicates that
their source code is UTF-8 encoded?
Or did you mean (2) that gnulib shall define a macro, that shall _assume_ that
the source code is UTF-8 encoded, and then expand to u8"xyz" instead of "xyz"?
Recall that the programmer is not usually telling GCC through command-line
options what the source encoding is. GCC has options -finput-charset and
-fexec-charset, but I have never seem them being used.
Also, UTF-8 is de-facto standard now: 99% of the web pages are in UTF-8,
and likely more than 95% of source code as well.
And on z/OS, users are not using GCC but the vendor compiler, which - as I
said - does not have compiler support that could reasonably be used.
For (1) to work, this macro would need to be defined in each source file,
after the #include statements - since the included headers files, possibly
from other packages, can be in a different source encoding. Few programmers
will want to do this.
For (2): what's the point? Once you assume that the source code is UTF-8
encoded, ISO C11 section 6.4.5 says that u8"xyz" and "xyz" are the same:
literals of type 'char *'.
Bruno
[1] https://gcc.gnu.org/onlinedocs/gcc-9.3.0/gcc/Preprocessor-Options.html
- Re: Unicode string literals,
Bruno Haible <=