emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs Lisp's future


From: Stephen J. Turnbull
Subject: Re: Emacs Lisp's future
Date: Sun, 12 Oct 2014 10:35:36 +0900

Mark H Weaver writes:
 > Eli Zaretskii <address@hidden> writes:

 > > Specify, and then drag it all the way down the encoding/decoding
 > > machinery.
 > 
 > The strictness flag should conceptually be part of the encoding, and
 > thus associated with the I/O port.

This is the way Emacs works already.

However, I think the Python system, where strictness is part of the
I/O port, not the encoding, and the encodings are designed to error
and then hand the invalid raw bytes to the error handler if desired,
is a better API.  I don't know how easy it would be to provide this in
Emacs (XEmacs streams are quite different from Emacs'), but it's
probably not too hard since the rawbytes facility is already present.
It would be nice to extend that to EOL handling as well IMO, but
that's not as big an issue.

 > This would obviate the need to propagate it down through layers of
 > code.

It's not so easy, because the layers of code referred to are not the
encoding/decoding machinery in the sense of the coding system (ISTR
you use "codec", Emacs calls them "coding systems" to be more like ISO
2022 "coding extensions" IIRC).  It's the mechanism for determining
exactly which coding system is to be used, and the difficulties are
really in the area of UI more so than in API.

In Emacs Lisp there's a tradition of embedding parameters which are
normally specified as constants in the name.  (This issue has already
been referred to in different terms.)  So instead of

    ;; these IO functions are all imaginary
    (let ((s (open-file "foo")))
      (set-stream-coding-system s 'utf-8)
      (set-stream-eol s 'unix)                ; EOL is LF
      (set-stream-invalid-coding-handler s 'strict)
      ;; now we can do I/O, signaling errors on invalid coding
      (read-stream-into-buffer s))
    ;; and now we're ready to edit, assuming valid coding!

Emacs does

    (find-file "foo" 'utf-8-unix-strict)  ; or is it utf-8-strict-unix? arghh!

Things are further complicated by the fact that Emacs has an extremely
complex system for specifying the encoding and the newline convention
used, and either or both might be automatically detected.  All of the
parameters can be tweaked at any stage in the specification routines,
and there are about 5 levels of configurability for files
(configuration is done by setting or binding dynamic variables) and
more than one for network and process streams (which are different).
Adding specification of the error handling convention will make the
*user interface* yet more complicated -- and it has to be possible for
all this to be done separately for every stream (you might trust files
on your host but not the network).  And then there's the "auto" coding
system, which guesses the appropriate coding system by analyzing the
input.

I have always thought that the Emacs' developers emphasis on having
Emacs "DWIM" so much in this area is somewhat misplaced[1], but that is
the way things are and have been since the late 1980s (Emacs actually
installed these features in 1998 or so, but there were patches that
were universally used for Asian languages from the late 1980s), and
there will be a lot of resistence from users and developers to any
changes that require them to do things differently.


Footnotes: 
[1]  Historically, these features were developed by Japanese
developers, who have to deal with an insane environment where even
today you will encounter at least 5 major encodings on a daily basis
(cheating a little, since UTF-16 is usually visible only inside MSFT
file formats and in Java programming), and most of those have
innumerable private variants (most large corporations in Japan have
private sets of Chinese characters that are in Unicode but were
historically not in the Japanese national standards).  It's easy to
see why Japanese would want a good guessing facility!  Most of the
rest of us either don't have to deal with it (95% of what we see is in
one particular encoding), or have an extremely difficult problem in
distinguishing the ones common in our environment (is this Latin-1 or
Latin-9? vs. the Japanese case where the bit patterns of the major
encodings are very distinctive).

This is not to say that guessing is a bad idea where it can be done
accurately, just that the Emacs facilities are way too complex for the
benefit they provide over a much simpler system.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]