Re: Dynamic loading progress

Eli Zaretskii <address@hidden> schrieb am Fr., 20. Nov. 2015 um 10:54 Uhr:

> From: Philipp Stephani <address@hidden>
> Date: Thu, 19 Nov 2015 22:41:09 +0000
> Cc: address@hidden
>
> Thanks for the thorough review; I'll prepare patches for all of these issues.

Thank you.

> This comment in module.c:
>
> /* If checking is enabled, abort if the current thread is not the
> Emacs main thread. */
> static void check_main_thread (void);
>
> confused me, because a later comment says:
>
> void module_init (void)
> {
> /* It is not guaranteed that dynamic initializers run in the main thread,
> therefore we detect the main thread here. */
>
> If dynamic initializers might run in a thread different from the Emacs
> main thread, then the code in module_init will record that other
> thread, not the Emacs main thread, right?
>
> No, because module_init is guaranteed to be called from the main thread because
> main calls it explicitly.

So you are saying that module_init runs in the main thread, but
dynamic initializers might run in another thread? How's that
possible?

According to http://stackoverflow.com/a/29776416 it is not guaranteed that dynamic initializers run in the main thread.

> Also, another comment:
>
> /* On Windows, we store both a handle to the main thread and the
> thread ID because the latter can be reused when a thread
> terminates. */
>
> seems to imply that 'main_thread' here is not the Emacs's main thread,
> because that thread never terminates as long as the Emacs session is
> alive.
>
> So what's the deal here? what does this thread checking supposed to
> detect?
>
> This guards against the Emacs main thread having exited while module code in
> some other thread is still running and attempting to call Emacs functions.

On what OS can this happen? It cannot happen on Windows, AFAIK,

http://blogs.msdn.com/b/oldnewthing/archive/2010/08/27/10054832.aspx seems to indicate that it is possible that other threads continue running after the main thread has exited.

so
complicating the code by using a handle is not necessary on Windows,
because the thread ID of the main Emacs thread will never be reused as
long as the Emacs session is alive. Moreover, we already have that
thread's ID in a global variable called dwMainThreadId, computed
during the session startup (albeit after module_init, so it would need
to be moved to an earlier stage), so we could simply compare against
its value in check_main_thread.

My initial code didn't use a thread identifier at all because it could have been reused after the main thread ended, introducing a subtle bug.

> This is undefined behavior, but we included an explicit check if
> checking is enabled because that case is somewhat subtle.

I'm surprised this could happen on some OS. Can you point to any
details about this?

It's not ruled out by the C standard or the APIs of operating systems. If Emacs doesn't call ExitThread or manipulates the C runtime, then indeed it can't happen (at least on Windows), but I think it's brittle to rely on such subtleties without explicitly checking for them (somebody might introduce a call to ExitThread in the future without looking at module code).

> In this code from module.c:
>
> Lisp_Object value = HASH_VALUE (h, i);
> eassert (NATNUMP (value));
> const EMACS_UINT refcount = XFASTINT (value);
> if (refcount >= MOST_POSITIVE_FIXNUM)
> {
> module_non_local_exit_signal_1 (env, Qoverflow_error, Qnil);
> return NULL;
> }
>
> how can the 'if' clause ever be true? refcount is an Emacs integer,
> as you have just verified, no? And if this somehow can happen, then
> why isn't there a similar check in the other functions?
>
> refcount can be MOST_POSITIVE_FIXNUM because that's an inclusive bound. It's
> important to check that case because later refcount is incremented by one, and
> if it's equal to MOST_POSITIVE_FIXNUM it would be outside the allowed range
> afterwards. No other function increments numbers, thus no other functions need
> this check.

OK, but then either there should have been a comment explaining this,
or, better, the test should have been after the addition. (Which, by
some strange luck -- or maybe something else -- is just what Paul
already did ;-)

If the test gets moved after the addition, then we should have a verify(MAX_EMACS_UINT - 1 > MOST_POSITIVE_FIXNUM) or use __builtin_add_overflow to make sure the addition doesn't cause UB.

> Re this fragment from module.c:
>
> Lisp_Object ret = list4 (Qlambda,
> list2 (Qand_rest, Qargs),
> documentation ? build_string (documentation) : Qnil,
> list3 (module_call_func,
> envobj,
> Qargs));
>
> Thou shalt not use build_string, except when you _know_ the argument
> will always be a pure-ASCII string. Practically, this means the
> argument must be a constant ASCII string. See these messages (and the
> preceding discussion, if you are interested) for the gory details:
>
> http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00955.html
> http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00976.html
> http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00979.html
>
> The above should call make_multibyte_string instead.
>
>
> We had a discussion about encodings in
> https://github.com/aaptel/emacs-dynamic-module/issues/37. Sorry that this
> didn't get resolved earlier; it's an important point. My suggestion would be to
> always mandate specifying an encoding whenever a char* is passed, and limit
> that to two or three functions dealing with creating strings and accessing
> string contents. Would that address your concerns?

No, this is not about encoding at all. This is about calling
build_string: you should only call it when the argument is a fixed
ASCII string. If the argument is not fixed or might include non-ASCII
characters, you should call make_multibyte_string instead. That's
because build_string might decide on its own whether to produce a
unibyte or a multibyte string, out of your control, whereas we always
want a multibyte string in this context.

build_string doesn't encode or decode its argument, it creates a Lisp
object whose text is taken from the argument. It's similar to
make_number in that respect.

I'll send a patch with my changes to explain my intention.

> Again in module.c:
>
> /*
> * Emacs internal encoding is more-or-less UTF8, let's assume utf8
> * encoded emacs string are the same byte size.
> */
>
> if (!buffer || length == 0 || *length-1 < raw_size)
> {
> *length = raw_size + 1;
> return false;
> }
>
> I don't understand why you need this assumption. You are going to
> encode the string in a moment, so why not test 'length' against the
> size you actually obtain there? (The above test will misfire when the
> original string includes characters above the Unicode codespace, which
> require 5 bytes internally, but their encoding maps them to Unicode
> codepoints which cannot take more than 4 bytes. So you might reject
> perfectly valid calls.)
>
> In module_make_string you have:
>
> /* Assume STR is utf8 encoded */
> return lisp_to_value (env, make_string (str, length));
>
> The discussion I pointed to above concluded that <quote>make_string is
> a bug</quote>. So please use make_multibyte_string here instead.
>
>
> See above; my suggestion would be to change the string handling code by
> limiting encoding and decoding to a small set of functions where the encoding
> would have to be specified explicitly.

Once again, that's not the issue. AFAIU, you have already specified
(albeit implicitly) that all strings passed as arguments to and from
the module functions are UTF-8 encoded. And that's fine by me, that's
not what these comments are about.

The first comment is about a minor issue in
module_copy_string_contents. Consider this fragment from that
function:

ptrdiff_t raw_size = SBYTES (lisp_str);

/* Emacs internal encoding is more-or-less UTF8, let's assume utf8
encoded emacs string are the same byte size. */

if (!buffer || length == 0 || *length-1 < raw_size)
{
*length = raw_size + 1;
return false;
}

Lisp_Object lisp_str_utf8 = ENCODE_UTF_8 (lisp_str);
eassert (raw_size == SBYTES (lisp_str_utf8));
*length = raw_size + 1;
memcpy (buffer, SDATA (lisp_str_utf8), SBYTES (lisp_str_utf8));
buffer[raw_size] = 0;

The comment and the 'if' clause after it in effect tell that you want
to check whether 'buffer' has enough space to hold the encoded string,
but you don't know how many bytes will be needed for that. So you
make some assumption (which, as I pointed out, could be wrong in rare
cases), and reject the strings that fail your test. What I'm sating
is that after the call to ENCODE_UTF_8, you can compute the exact
length in bytes of the UTF-8 encoded string, so if you move the test
there, you won't have to make any assumptions and you won't err.

The second comment is again about calling make_string (which
build_string calls internally): use make_multibyte_string instead.

> static void module_set_user_finalizer (emacs_env *env,
> emacs_value uptr,
> emacs_finalizer_function fin)
> {
> check_main_thread ();
> eassert (module_non_local_exit_check (env) == emacs_funcall_exit_return);
> const Lisp_Object lisp = value_to_lisp (uptr);
> if (! USER_PTRP (lisp)) module_wrong_type (env, Quser_ptr, lisp);
> XUSER_PTR (lisp)->finalizer = fin; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> }
>
> No validity checks on 'fin'?
>
> How should it be validated? In C an arbitrary (invalid) pointer could be
> passed. I think we just have to accept that this is UB.

Given the extensive checks elsewhere in the file, this surprised me,
that's all. How to test it? calling valid_pointer_p would be one
thing I'd suggest. Maybe there are other (perhaps platform-dependent)
checks possible here, I would have to look around. For example, it
would be good to make sure this points into executable code.

You can't validate pointers in C. For example, is the following pointer valid?

long a;

double *p = (double *)(&a);

The answer is no; this is an aliasing violation, and dereferencing p is UB, but you won't detect this by trying to read the process's address space. See also http://blogs.msdn.com/b/oldnewthing/archive/2006/09/27/773741.aspx. The best way to detect such issues is running under asan.

> if (len > INT_MAX || len < envptr->min_arity || (envptr->max_arity >= 0 &&
> len > envptr->max_arity))
> xsignal2 (Qwrong_number_of_arguments, module_format_fun_env (envptr),
> make_number (len));
>
> Why the test against INT_MAX? EMACS_INT can legitimately be a 64-bit
> data type with values far exceeding INT_MAX. Why impose this
> limitation here?
>
> Because the nargs argument in the module interface is an int.

Shouldn't it be EMACS_INT instead?

EMACS_INT is an internal type that isn't exposed to module authors. It should be either int or int64.

> If you think functions with more than INT_MAX arguments should be supported,
> the type for nargs should be changed to int64.

I thought your design decision was already to use int64 for any
integer argument, no?

Daniel's initial design (https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00960.html) had int, but I guess he wouldn't mind changing this to int64.

Anyway, something that comes from Lisp should allow EMACS_INT, IMO.

It can't use EMACS_INT because that's an internal type defined in lisp.h, and emacs-module.h can't include lisp.h. Furthermore is should be a type with a static size to allow for ABI stability. Mismatches should be dealt with by explicit bounds checks.

> allocate_emacs_value calls malloc; shouldn't it call xmalloc instead,
> or at least conform to the XMALLOC_BLOCK_INPUT_CHECK protocol?
>
> If xmalloc is called, then we need to make sure that no signals (longjmps) can
> escape to module code. If appropriate setjmps are in place that should be
> doable, but I need to check whether there are edge cases.

Paul already addressed that: xmalloc in Emacs does double or even
triple duty, and it would be a shame to lose that here. So I think we
should try to find a way to modify xmalloc such that it satisfies the
module code as well.

Could xmalloc grow a variant that is guaranteed not to call longjmp?

> Btw, I wonder whether we should provide a more friendly capabilities
> for retrieving the function name and its module. dladdr is not
> portable enough, and we have all the information at our fingertips
> in the module's init function, we just need to keep it instead of
> throwing it away. I envision debugging module-related problems will
> not be a very rare situation, so we need any help we can get. WDYT?
>
> Hmm, I don't know whether we have access to the function name without using
> dladdr.

I thought about recording their names when you bind them to Lisp
symbols.

That would be nice, however it would require set_symbol_function, Fdefalias, etc. to explicitly check for module functions in their RHS. Not sure whether that's worth it.

> About the tests: The Makefile in mod-test is Unix-specific: it uses a
> literal .so extension. I also think the Python script should be
> rewritten in Emacs Lisp, so that Python installation is not required.
> Finally, all of the module tests and associated files should be moved
> into test/, preferably even test/automated/ and made part of the "make
> check" run.
>
> Yes, tracked in https://github.com/aaptel/emacs-dynamic-module/issues/34

You will see that I put there some preliminary stuff, so at least
these tests can be run manually on Windows, by overriding a single
Make variable from the command line.

Thanks.

From:	Philipp Stephani
Subject:	Re: Dynamic loading progress
Date:	Fri, 20 Nov 2015 19:58:25 +0000