Re: Dynamic loading progress

Eli Zaretskii <address@hidden> schrieb am Fr., 20. Nov. 2015 um 22:56 Uhr:

> From: Philipp Stephani <address@hidden>
> Date: Fri, 20 Nov 2015 19:58:25 +0000
> Cc: address@hidden, address@hidden, address@hidden
>
> So you are saying that module_init runs in the main thread, but
> dynamic initializers might run in another thread? How's that
> possible?
>
> According to http://stackoverflow.com/a/29776416 it is not guaranteed that
> dynamic initializers run in the main thread.

Ah, you are talking about C++ dynamic initializers! So the model is
that someone writes a module in C++, starts a thread there, and then
inside some dynamic initializer calls emacs-module interface
functions, is that it? If that's the situation, I'd suggest a
prominent commentary describing this in the source.

The SO post talks about C++ but the issue is the same in C. AFAIK with C11 and C++11 the execution models are harmonized.

It seems the comment is overly confusing. It is supposed to warn about the following. Naively, if you wanted to test whether you are in the main thread, you would do (module types and naming):

static thread_id main_thread = get_current_thread();

bool in_main_thread() { return get_current_thread() == main_thread; }

The dynamic initializer here is the first "get_current_thread()"; it is not guaranteed to run in the main thread, so "main_thread" is not guaranteed to contain the main thread ID. Therefore you have to do:

static thread_id main_thread; // initialized later

int main() {

// guaranteed to be in the main thread

main_thread = get_current_thread();

}

That's all. I'm not aware of any runtime that would run dynamic initializers outside of the main thread, but it's not impossible and easy to protect against.

> On what OS can this happen? It cannot happen on Windows, AFAIK,
>
> http://blogs.msdn.com/b/oldnewthing/archive/2010/08/27/10054832.aspx seems to
> indicate that it is possible that other threads continue running after the main
> thread has exited.

Not in Emacs: we use the C runtime, and never call ExitThread, let
alone ExitProcess. Doing that would be a silly bug. There's no need
to make a single component of Emacs, and not the most important one,
protected to such extreme degree against calamities that are only
possible if Emacs developers will lose their senses.

Fair enough. We can replace this with a comment ("assume no more threads are running after main has exited") if you prefer that.

> My initial code didn't use a thread identifier at all because it could have
> been reused after the main thread ended, introducing a subtle bug.

No, it cannot be reused, because we hold a handle on the main thread,
see w32term.c around line 6985. As long as a single handle is open on
a thread, it still exists from the OS POV, and its ID cannot be
reused.

> > This is undefined behavior, but we included an explicit check if
> > checking is enabled because that case is somewhat subtle.
>
> I'm surprised this could happen on some OS. Can you point to any
> details about this?
>
>
> It's not ruled out by the C standard or the APIs of operating systems. If Emacs
> doesn't call ExitThread or manipulates the C runtime, then indeed it can't
> happen (at least on Windows), but I think it's brittle to rely on such
> subtleties without explicitly checking for them (somebody might introduce a
> call to ExitThread in the future without looking at module code).

I think these precautions are unnecessary.

Anyway, thanks for explaining this, I now know how to change the code
to DTRT on MS-Windows wrt to the thread checks.

This is unfortunately all surprisingly subtle and vaguely defined. See e.g. http://stackoverflow.com/q/19744250/178761 (apparently the standards are vague about what happens to detached threads after main has exited).

> OK, but then either there should have been a comment explaining this,
> or, better, the test should have been after the addition. (Which, by
> some strange luck -- or maybe something else -- is just what Paul
> already did ;-)
>
> If the test gets moved after the addition, then we should have a verify
> (MAX_EMACS_UINT - 1 > MOST_POSITIVE_FIXNUM) or use __builtin_add_overflow to
> make sure the addition doesn't cause UB.

I don't think so. Please take a look at the code Paul wrote there, I
don't see a problem with it.

There is no problem with the code, but reading it requires knowledge that there are tag bits, which prevent a possible signed overflow. A verify would be a compiler-checked comment for that.

> > No validity checks on 'fin'?
> >
> > How should it be validated? In C an arbitrary (invalid) pointer could be
> > passed. I think we just have to accept that this is UB.
>
> Given the extensive checks elsewhere in the file, this surprised me,
> that's all. How to test it? calling valid_pointer_p would be one
> thing I'd suggest. Maybe there are other (perhaps platform-dependent)
> checks possible here, I would have to look around. For example, it
> would be good to make sure this points into executable code.
>
> You can't validate pointers in C. For example, is the following pointer valid?
>
> long a;
> double *p = (double *)(&a);
>
> The answer is no; this is an aliasing violation, and dereferencing p is UB, but
> you won't detect this by trying to read the process's address space.

The function valid_pointer_p validates a pointer in the sense that it
points into the program's address space. That's not a full
validation, and won't catch unaligned pointers or pointers into
non-executable portions of memory, but it's better than nothing. I'd
certainly consider it under "#ifdef ENABLE_CHECKING" at least.

We can add that for consistency, though I'm not sure whether such checks don't do more harm than good.

> See also
> http://blogs.msdn.com/b/oldnewthing/archive/2006/09/27/773741.aspx.

We don't use IsBadWritePtr on Windows to check this, see
w32_valid_pointer_p for how this is actually implemented.

Much of this applies generally.

> "But what should I do, then, if somebody passes me a bad pointer?"

> You should crash.

Anyway, I'm surprised by this extreme POV: even if we cannot validate
a pointer 100%, surely it doesn't mean we cannot or shouldn't do some
partial job? Why this "all or nothing" approach?

We can check whether it's NULL. Apart from that, everything else is outside of the C standard.

> > if (len > INT_MAX || len < envptr->min_arity || (envptr->max_arity >= 0
> &&
> > len > envptr->max_arity))
> > xsignal2 (Qwrong_number_of_arguments, module_format_fun_env (envptr),
> > make_number (len));
> >
> > Why the test against INT_MAX? EMACS_INT can legitimately be a 64-bit
> > data type with values far exceeding INT_MAX. Why impose this
> > limitation here?
> >
> > Because the nargs argument in the module interface is an int.
>
> Shouldn't it be EMACS_INT instead?
>
> EMACS_INT is an internal type that isn't exposed to module authors.

OK, but the type of nargs is ptrdiff_t, so the test should be against
the maximum of that type. On 64-bit hosts, ptrdiff_t is a 64-bit
type, while an int is still 32-bit wide.

Yes, the type is *now* ptrdiff_t. It was int when I wrote the code :-)

> Could xmalloc grow a variant that is guaranteed not to call longjmp?

We need to devise a way for it to detect that it was called from
emacs-module.c, the rest is simple, I think.

Hmm, why does it need to detect anything? Can't it just be a different function that doesn't signal, similar to push_handler and push_handler_nosignal?

From:	Philipp Stephani
Subject:	Re: Dynamic loading progress
Date:	Fri, 20 Nov 2015 23:22:49 +0000