Re: Unicode string literals

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode string literals

From:	Bruno Haible
Subject:	Re: Unicode string literals
Date:	Fri, 01 May 2020 11:01:48 +0200
User-agent:	KMail/5.1.3 (Linux/4.4.0-177-generic; KDE/5.18.0; x86_64; ; )

Hi Paul,

> >> Could we have a macro to be used only in source code encoded via UTF-8?
> >> Presumably the older compilers would process the bytes of the string as if 
> >> they
> >> were individual 8-bit characters and would pass them through unchanged, so 
> >> the
> >> run-time string would be UTF-8 too.
> 
> > This would allow writing a macro that prefixes "u8" to strings in
> > compilers supporting enough of C11, skipping the prefix in compilers
> > that pass UTF-8 encoded bytes in strings unchanged
> 
> Yes, that was the idea.

Did you mean (1) that the programmer shall define a macro, that indicates that
their source code is UTF-8 encoded?

Or did you mean (2) that gnulib shall define a macro, that shall _assume_ that
the source code is UTF-8 encoded, and then expand to u8"xyz" instead of "xyz"?

Recall that the programmer is not usually telling GCC through command-line
options what the source encoding is. GCC has options -finput-charset and
-fexec-charset, but I have never seem them being used.

Also, UTF-8 is de-facto standard now: 99% of the web pages are in UTF-8,
and likely more than 95% of source code as well.

And on z/OS, users are not using GCC but the vendor compiler, which - as I
said - does not have compiler support that could reasonably be used.

For (1) to work, this macro would need to be defined in each source file,
after the #include statements - since the included headers files, possibly
from other packages, can be in a different source encoding. Few programmers
will want to do this.

For (2): what's the point? Once you assume that the source code is UTF-8
encoded, ISO C11 section 6.4.5 says that u8"xyz" and "xyz" are the same:
literals of type 'char *'.

Bruno

[1] https://gcc.gnu.org/onlinedocs/gcc-9.3.0/gcc/Preprocessor-Options.html

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Unicode string literals, Bruno Haible <=
- Re: Unicode string literals, Paul Eggert, 2020/05/01
  - Re: Unicode string literals, Bruno Haible, 2020/05/01
    - Re: Unicode string literals, Daniel Richard G., 2020/05/01

Prev by Date: Re: pure and const function attributes
Next by Date: Re: xsize and flexmember
Previous by thread: Re: pure and const function attributes
Next by thread: Re: Unicode string literals
Index(es):
- Date
- Thread