[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] devutf8 on Windows

From: Jeff Conrad
Subject: Re: [groff] devutf8 on Windows
Date: Tue, 26 Feb 2019 00:55:45 +0000

On Monday, February 25, 2019 7:58 AM, Eli Zaretskii wrote:
> You can verify the results with a dependency walker:
> Or, if you have GNU Binutils, you do this:
>   objdump -x PROGRAM.exe | fgrep "DLL Name:"
> where PROGRAM.exe is your test program.

As I suspected, I don’t have it.  MSVC won’t even let me link it (or I’m
not doing it right).

We could probably pin it down among DLLs, compiler, and Windows version
if I sent you my executable.  If it doesn’t work properly on your
system, perhaps they actually improved something in Windows 10.

> However, I must say that such a driver will be of somewhat limited
> utility, for several reasons.  First, cp1252 is only used for Latin
> locales; and although the quote characters have the same codepoint in
> all the 125X codepage series, the other characters do vary, and will
> not display properly when the console codepage is different.  So users
> will have to invoke chcp, and make sure they have a console font to
> cover the characters.

How is this any worse than with latin1?

> The second reason is that Windows CMD consoles, at least in
> single-byte locales, are by default set up for old DOS codepages, not
> for Windows codepages, so in Latin locales the default console
> codepage is 437, not 1252.  And the console can also be set for
> different codepages for input and output.  It's quite a mess, from the
> POV of a Posix application.

No disagreement; I default to CP437.  My shell startup files set CP1252;
I only use cmd when I’m trying to sort things out (like here).

> Last, but not least, Windows APIs are not agnostic to encoding, they
> in many cases don't support treating text as just a byte stream.
> Instead, they _interpret_ text using the encoding they assume for the
> text.  This is because many APIs call internal Windows functions,
> which all work in UTF-16, so they need to convert text into UTF-16,
> and that requires to know the original encoding (and also the target
> encoding).  This is why you get question marks instead of characters
> Windows didn't think belong to your current codepage.

CP1252 seems to work fine for me.  The only code page mismatches I’ve
had have resulted from messing around with CP65001, so if I quit doing
that, I should be OK.  I would agree that sending the output to a file
and sending that file to someone else would probably be a bad idea.

Anyway, my implementation is available if anyone wants it.

> So bottom line, having a cp1252 tty device driver would be even less
> useful on Windows than the latin1 device driver.  Somewhat useful, but
> not too much, IMO.

As above, I’m not seeing why it’s less useful than latin1.  One needs to
worry about the code page in either case.  And what about those who need
other latin fonts?  With CP1252, at least one gets the extra characters
from the C1 area, which probably cover 95% of my needs.

Ultimately, though, what’s the alternative? “Yer screwed”?  ASCII, with
its weird quotes (fixed on my system years ago, but still a typewriter)?
Like I had in 1987 ... somethin’ doesn’t seem right.

There are other problems with UTF-8 on Windows.  The Lucida Console font
doesn’t include the hyphen (U+2010), so a lot of man pages look weird
with boxes at the end of lines.  There may be a few other characters
missing that I haven’t yet noticed.

You go to war with the army you have, not the army you want or might
wish to have at a later time.  Or, given the history, the army you well
may _never_ have.

It seems like it just shouldn’t be this difficult ... silly me.

Thanks again for all the help with this.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]