texinfo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using Perl's cc


From: Eli Zaretskii
Subject: Re: Using Perl's cc
Date: Sun, 05 Jul 2015 17:30:10 +0300

> Date: Sat, 04 Jul 2015 21:19:51 +0300
> From: Eli Zaretskii <address@hidden>
> Cc: address@hidden
> 
> > It relies on a UTF-8 codeset being in the locale to be able to use the
> > C standard library functions to operate on UTF-8 data, like mbrtowc.
> > The UTF-8 data is coming from the Perl instance. (Perl strings have
> > two possible internal encodings: one is UTF-8, the other is either
> > Latin-1 or "native". The second's not reliable so I forced the UTF-8
> > representation.) If we can't do that, then it shouldn't be a big
> > problem to write or copy from elsewhere code to process UTF-8 data,
> > because the encoding isn't that complicated.
> 
> Can we use wchar_t instead?  Windows does support that out of the box,
> and a few functions that are absent, like wcwidth, can be easily
> written or emulated.  What's important, Windows' wchar_t type uses
> UTF-16 encoded Unicode codepoints, so all that's needed is conversion
> from and to UTF-8.  On GNU/Linux, wchar_t is a 32-bit data type that
> carries the Unicode codepoints themselves, so again just two-way
> conversion will be needed.
> 
> > Another problem would be the use of functions that operate on wide
> > characters: iswupper, iswspace, and wcwidth. It'd be too much to
> > replicate these completely.
> 
> See above: Windows already has most of them.  Just try to avoid using
> those that are not defined by ANSI C, they might be unavailable (but
> could be provided if really needed).

On second thought, and after reading xspara.c, I take back what I
wrote above.  Moving to wchar_t will be probably hard, and is not
really needed, as the number of functions xspara.c calls that need to
support UTF-8 is very small, and can be easily implemented for
Windows.  The only one that is non-trivial is wcwidth, but we can use
Markus Kuhn's implementation (which might be a good idea for other
platforms as well, since I don't believe many non-glibc platforms will
have an implementation that supports the entire Unicode range).

So where do you want me to put the Windows implementations of mbrtowc,
mbrlen, iswspace, iswupper, and wcwidth?  Should I make a separate C
file and #include it in xspara.c, like we do with pcterm.c?

Thanks.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]