[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dynamic loading progress

From: Eli Zaretskii
Subject: Re: Dynamic loading progress
Date: Sat, 21 Nov 2015 15:23:37 +0200

> From: Philipp Stephani <address@hidden>
> Date: Sat, 21 Nov 2015 12:11:45 +0000
> Cc: address@hidden, address@hidden, address@hidden
>     No, we cannot, or rather should not. It is unreasonable to expect
>     external modules to know the intricacies of the internal
>     representation. Most Emacs hackers don't.
> Fine with me, but how would we then represent Emacs strings that are not valid
> Unicode strings? Just raise an error?

No need to raise an error.  Strings that are returned to modules
should be encoded into UTF-8.  That encoding already takes care of
these situations: it either produces the UTF-8 encoding of the
equivalent Unicode characters, or outputs raw bytes.

We are using this all the time when we save files or send stuff over
the network.

>     No, I meant strict UTF-8, not its Emacs extension.
> That would be possible and provide a clean interface. However, Emacs strings
> are extended, so we'd need to specify how they interact with UTF-8 strings.
> * If a module passes a char sequence that's not a valid UTF-8 string, but a
>   valid Emacs multibyte string, what should happen? Error, undefined behavior,
>   silently accepted?

We are quite capable of quietly accepting such strings, so that is
what I would suggest.  Doing so would be in line with what Emacs does
when such invalid sequences come from other sources, like files.

> * If copy_string_contents is passed an Emacs string that is not a valid 
> Unicode
>   string, what should happen?

How can that happen?  The Emacs string comes from the Emacs bowels, so
it must be "valid" string by Emacs standards.  Or maybe I don't
understand what you mean by "invalid Unicode string".

In any case, we already deal with any such problems when we save a
buffer to a file, or send it over the network.  This isn't some new
problem we need to cope with.

> OK, then we can use that, of course. The question of handling invalid UTF-8
> strings is still open, though, as make_multibyte_string doesn't enforce valid
> UTF-8.

It doesn't enforce valid UTF-8 because it can handle invalid UTF-8 as
well.  That's by design.

> If it's the contract of make_multibyte_string that it will always accept 
> UTF-8,
> then that should be added as a comment to that function. Currently I don't see
> it documented anywhere. 

That part of the documentation is only revealed to veteran Emacs
hackers, subject to swearing not to reveal that to the uninitiated and
to some blood-letting that seals the oath ;-)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]