emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MS-Windows Pseudo Console: a mixed blessing?


From: Eli Zaretskii
Subject: MS-Windows Pseudo Console: a mixed blessing?
Date: Sun, 14 Feb 2021 19:43:20 +0200

I took a look at this new feature in Windows 10, with the intent of
teaching Emacs to use it, so we could finally have emulated PTYs on
MS-Windows, and make communications with sub-processes use them, as we
do on Posix systems.  That would allow us to fix the problems with
buffering of the sub-processes' standard streams and with
sub-processes behaving differently from interactive invocations
because their standard streams are connected to pipes, not to a
terminal device.

However, after a few days of tinkering, coding, and reading on this, I
conclude that this hyped feature is flawed, if used as a communication
channel vis-à-vis subprocesses.  Two main reasons:

 . The parent process reading from a Pseudo Console must deal with
   console commands received as escape sequences.

   The stuff the parent process reads from the subprocess includes
   escape sequences that the subprocess never sent.  They are inserted
   by the Pseudo Console itself.  In the simplest case of an
   application that just writes lines of text, the stuff the parent
   process reads includes console commands to clear the screen, switch
   to default colors, and home the cursor, and to display the name of
   the application on the title bar, before it sees the first line of
   text actually output by the application.  It also includes the
   commands to turn off and on the cursor at the end of each line
   output by the application.

   The parent process will need to recognize and filter out these
   escape sequences.

 . The text read from the Pseudo Console is always encoded in UTF-8.

   This in itself is already a complication, because, when reading
   process output, we will need to ignore the coding-systems set for
   the subprocess, but still apply the coding-systems when encoding
   the command line for the process.

   However, a much more serious problem is with _how_ the Pseudo
   Console converts the actual output of the subprocess to UTF-8: it
   does that by _assuming_ the subprocess uses the current console
   codepage.  Which means that programs that use a different encoding
   will have their output converted into UTF-8 sequences that express
   the wrong characters.  While the program running in a subprocess
   can tell the console which encoding it will be using, there's no
   way I could find for the _parent_ process to do that before it
   invokes the subprocess.  So the typical use case, whereby an Emacs
   program knows which encoding to expect from a subprocess (think Git
   as an example, which uses UTF-8 by default), and sets up to decode
   its output accordingly, Emacs will have no way of telling the
   Pseudo Console that this subprocess will output UTF-8, and thus
   prevent the Pseudo Console from attempting to interpret UTF-8 as
   some codepage.

   This issue could perhaps be dealt with by decoding as UTF-8, then
   encoding using the default console codepage, then decoding back
   using the correct encoding.  But this double decoding will make I/O
   more expensive, and in some applications the performance penalty
   could be tangible enough to force applications to opt for pipes as
   a lesser evil.  And in any case, coding this won't be fun.

   A similar problem exists with non-ASCII text sent to the
   subprocess: it needs to be encoded in UTF-8, but Pseudo Console
   will convert it to the wrong encoding.  So the same double encoding
   trick will have to be used here.

I sincerely hope that someone knowledgeable will point out something
that I missed and that allows to solve these issues cleanly and
efficiently.  If not, I'm afraid that this feature will be much less
attractive for us than I originally hoped.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]