[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dynamic loading progress

From: Eli Zaretskii
Subject: Re: Dynamic loading progress
Date: Sun, 22 Nov 2015 21:20:21 +0200

> From: Philipp Stephani <address@hidden>
> Date: Sun, 22 Nov 2015 18:19:29 +0000
> Cc: address@hidden, address@hidden, address@hidden
>     I already suggested what we should say in the documentation: that
>     these interfaces accept and produce UTF-8 encoded non-ASCII text.
> If the interface accepts UTF-8, then it must signal an error for invalid
> sequences; the Unicode standard mandates this.

The Unicode standard cannot mandate anything for Emacs, because Emacs
is not subject to Unicode standardization.

> If the interface produces UTF-8, then it must only ever produce valid
> sequences

As I explained, this would violate the basic expectation from a text
editing program.

> That's why I propose to not encode raw bytes as bytes, but as the Emacs 
> integer
> codes used to represent them.

If we do that, no external code will be able to do anything useful
with such "bytes".  Module authors will have to write their own
replacements for library functions.  This will never be accepted by
our users.

> If any byte sequence is accepted, then the behavior becomes more complex. We
> need to exhaustively describe the behavior for any possible byte sequence,
> otherwise module authors cannot make any assumption.

We say that we accept valid UTF-8 encoded strings; anything else
might produce invalid UTF-8 on output.

> No matter what we expect or tolerate, we need to state that.

No, we don't.  When the callers violate the contract, they cannot
expect to know in detail what will happen.  If they want to know, they
will have to read the source.

> Module authors are not end users.

They are users like anyone who writes Lisp.  They came to expect that
Emacs behaves in certain ways, and modules should follow suit.

> I agree that end users should not see errors on decoding failure,
> but modules use only programmatic access, where we can be more
> strict.

You cannot be more strict, unless you rewrite the whole
encoding/decoding machinery, or write specialized code to detect and
reject invalid UTF-8 before it is passed to a decoder.  There are no
good reasons to do either, so let's not.

> An Emacs string is a sequence of integers.

No, it's a sequence of bytes.

> I agree that we shouldn't add such limitations. But I disagree that we should
> leave the behavior undocumented in such cases.

OK, so let's agree to disagree.  If that disagreement gets in your way
of fixing the issues related to this discussion, please say so, and I
will fix them myself


reply via email to

[Prev in Thread] Current Thread [Next in Thread]