[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
MS-Windows Pseudo Console: a mixed blessing?
From: |
Eli Zaretskii |
Subject: |
MS-Windows Pseudo Console: a mixed blessing? |
Date: |
Sun, 14 Feb 2021 19:43:20 +0200 |
I took a look at this new feature in Windows 10, with the intent of
teaching Emacs to use it, so we could finally have emulated PTYs on
MS-Windows, and make communications with sub-processes use them, as we
do on Posix systems. That would allow us to fix the problems with
buffering of the sub-processes' standard streams and with
sub-processes behaving differently from interactive invocations
because their standard streams are connected to pipes, not to a
terminal device.
However, after a few days of tinkering, coding, and reading on this, I
conclude that this hyped feature is flawed, if used as a communication
channel vis-à-vis subprocesses. Two main reasons:
. The parent process reading from a Pseudo Console must deal with
console commands received as escape sequences.
The stuff the parent process reads from the subprocess includes
escape sequences that the subprocess never sent. They are inserted
by the Pseudo Console itself. In the simplest case of an
application that just writes lines of text, the stuff the parent
process reads includes console commands to clear the screen, switch
to default colors, and home the cursor, and to display the name of
the application on the title bar, before it sees the first line of
text actually output by the application. It also includes the
commands to turn off and on the cursor at the end of each line
output by the application.
The parent process will need to recognize and filter out these
escape sequences.
. The text read from the Pseudo Console is always encoded in UTF-8.
This in itself is already a complication, because, when reading
process output, we will need to ignore the coding-systems set for
the subprocess, but still apply the coding-systems when encoding
the command line for the process.
However, a much more serious problem is with _how_ the Pseudo
Console converts the actual output of the subprocess to UTF-8: it
does that by _assuming_ the subprocess uses the current console
codepage. Which means that programs that use a different encoding
will have their output converted into UTF-8 sequences that express
the wrong characters. While the program running in a subprocess
can tell the console which encoding it will be using, there's no
way I could find for the _parent_ process to do that before it
invokes the subprocess. So the typical use case, whereby an Emacs
program knows which encoding to expect from a subprocess (think Git
as an example, which uses UTF-8 by default), and sets up to decode
its output accordingly, Emacs will have no way of telling the
Pseudo Console that this subprocess will output UTF-8, and thus
prevent the Pseudo Console from attempting to interpret UTF-8 as
some codepage.
This issue could perhaps be dealt with by decoding as UTF-8, then
encoding using the default console codepage, then decoding back
using the correct encoding. But this double decoding will make I/O
more expensive, and in some applications the performance penalty
could be tangible enough to force applications to opt for pipes as
a lesser evil. And in any case, coding this won't be fun.
A similar problem exists with non-ASCII text sent to the
subprocess: it needs to be encoded in UTF-8, but Pseudo Console
will convert it to the wrong encoding. So the same double encoding
trick will have to be used here.
I sincerely hope that someone knowledgeable will point out something
that I missed and that allows to solve these issues cleanly and
efficiently. If not, I'm afraid that this feature will be much less
attractive for us than I originally hoped.
- MS-Windows Pseudo Console: a mixed blessing?,
Eli Zaretskii <=