groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] devutf8 on Windows


From: Eli Zaretskii
Subject: Re: [groff] devutf8 on Windows
Date: Mon, 25 Feb 2019 17:58:02 +0200

> From: Jeff Conrad <address@hidden>
> CC: "address@hidden" <address@hidden>
> Date: Mon, 25 Feb 2019 12:12:39 +0000
> 
> On Monday, February 25, 2019 4:03 AM, Eli Zaretskii:
> 
> > The only explanation I could come up with regarding your simple program is
> > that VS linked it against static libraries,
> 
> I’m sure I’m linking statically.

The question is rather what is VS doing...

You can verify the results with a dependency walker:

  http://www.dependencywalker.com/

Or, if you have GNU Binutils, you do this:

  objdump -x PROGRAM.exe | fgrep "DLL Name:"

where PROGRAM.exe is your test program.

> > In any case, the conclusion remains that UTF-8 console output on Windows is
> > unreliable, perhaps apart of Windows 10.
> 
> And though Win 10 seems usable, it’s hardly great with UTF-8.  Would it
> then make sense to include a devcp1252, which—though it may make
> purists nauseous—would at least give reasonable, reliable output for a
> high percentage of documents?

Maybe (I don't steer Groff development, so my opinion doesn't matter
much).

However, I must say that such a driver will be of somewhat limited
utility, for several reasons.  First, cp1252 is only used for Latin
locales; and although the quote characters have the same codepoint in
all the 125X codepage series, the other characters do vary, and will
not display properly when the console codepage is different.  So users
will have to invoke chcp, and make sure they have a console font to
cover the characters.

The second reason is that Windows CMD consoles, at least in
single-byte locales, are by default set up for old DOS codepages, not
for Windows codepages, so in Latin locales the default console
codepage is 437, not 1252.  And the console can also be set for
different codepages for input and output.  It's quite a mess, from the
POV of a Posix application.

Last, but not least, Windows APIs are not agnostic to encoding, they
in many cases don't support treating text as just a byte stream.
Instead, they _interpret_ text using the encoding they assume for the
text.  This is because many APIs call internal Windows functions,
which all work in UTF-16, so they need to convert text into UTF-16,
and that requires to know the original encoding (and also the target
encoding).  This is why you get question marks instead of characters
Windows didn't think belong to your current codepage.

So bottom line, having a cp1252 tty device driver would be even less
useful on Windows than the latin1 device driver.  Somewhat useful, but
not too much, IMO.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]