Eli Zaretskii <address@hidden
> schrieb am Sa., 21. Nov. 2015 um 10:30 Uhr:
> From: Philipp Stephani <address@hidden>
> Date: Sat, 21 Nov 2015 09:01:12 +0000
> Cc: address@hidden, address@hidden, address@hidden
> Let me summarize the issues I see: The internal Emacs encoding can change
> between versions (command in mule-conf.el), therefore we shouldn't use it in
> the module API. IIUC this rules out make_multibyte_string: it only accepts the
> internal encoding. Therefore I proposed to always have users specify the
> encoding explicitly and then use code_convert_string_norecord to create the
> Lisp string objects. Would that work? (We probably then need another set of
> functions for unibyte strings.)
I'm not sure I'm following, so let's take a step back, okay?
My comments were about using build_string and make_string in 2
functions defined in emacs-module.c: module_make_function and
module_make_string. Both of these emacs-module.c functions produce
strings for consumption by Emacs, AFAIU: the former produces a doc
string of a function defined by a module, which will be used by
various documentation-related functions and commands within Emacs, the
latter produces a string to be passed to Emacs Lisp code for use as
any other Lisp string. Do you agree so far?
If you agree, then in both cases the strings these functions return
should be in the internal representation of strings used by Emacs, not
in some encoding like UTF-8 or ISO-8859-1. (We could also use encoded
strings, but that would require Lisp programs using module functions
to always decode any strings they receive, which is less efficient and
Yes. Just for understanding: there are two types of strings: unibyte (just a sequence of chars), and multibyte (sequence of chars interpreted in the internal Emacs encoding), right?
(Btw, I don't think we should worry about changing the internal
representation of characters in Emacs, because make_multibyte_string
will be updated as needed.)
This is a crucial point. If the internal encoding never changes, then we can declare that those string parameters are expected to be in the internal encoding. But see the discussion in https://github.com/aaptel/emacs-dynamic-module/issues/37
: the comment in mule-conf.el seems to indicate that the internal encoding is not stable.
This is what my comments were about. I think that you, by contrast,
are talking about the encoding of the _input_ strings, in this case
the 'documentation' argument to module_make_function and 'str'
argument to module_make_string. My assumption was that these
arguments will always have to be in UTF-8 encoding; if that assumption
is true, then no decoding via code_convert_string_norecord is
necessary, since make_multibyte_string will DTRT. We can (and
probably should) document the fact that all non-ASCII strings must be
UTF-8 encoded as a requirement of the emacs-module interface.
If you are thinking about accepting strings encoded in other
encodings, I'd consider this an extension, to be added later if
needed. After all, a module can easily convert to UTF-8 by itself,
using facilities such as iconv.
Yes, provided the internal Emacs encoding is stable.
In any case, code_convert_string_norecord cannot be the complete
solution, because it accepts Lisp string objects, not C strings. You
still need to create a Lisp string (but this time using
make_unibyte_string). The point is to always use either
make_unibyte_string or make_multibyte_string, and never build_string
or make_string; the latter 2 should only be used for fixed ASCII-only
Yes, that's fine, the question is about whether the internal encoding is stable. If it's stable, we can use make_multibyte_string; if not, we can only use make_unibyte_string.