Re: [PATCH] Support threads in modules

Eli Zaretskii <address@hidden> schrieb am Sa., 10. Juni 2017 um 14:38 Uhr:

> From: Philipp Stephani <address@hidden>
> Date: Sat, 10 Jun 2017 11:38:32 +0000
> Cc: address@hidden, address@hidden
>
> Eli Zaretskii <address@hidden> schrieb am Mi., 26. Apr. 2017 um 07:32 Uhr:

[It's very hard to have a discussion with 1.5 months between messages.]

Yes, sorry for the huge delay, I wanted to follow up here for a while :-/

> > > > - Using objects across threads requires careful synchronization.
> > >
> > > Not sure what synchronization you have in mind. There's only one
> > > running thread at any given time, and a thread cannot be preempted
> > > while it runs Lisp, so synchronization seems unnecessary, as a thread
> > > cannot modify a Lisp object that another thread is modifying.
> > >
> > > However, that's just an implementation detail, and the mapping between
> > Emacs and OS threads is
> > > unspecified (for good reason).
> >
> > It isn't unspecified: the ELisp manual explicitly documents that only
> > one Lisp thread can run at any given time.
>
> That is true for a *Lisp* thread, not for an *OS* thread. On the OS thread
> level we might theoretically have multiple threads per Lisp thread, or miss
> some synchronization barriers, etc.

You were AFAIU talking about accesses to Lisp objects, and these are
only possible via Lisp, which can only be run in a single thread at a
time. So if the non-Lisp threads of any kind can enter this picture,
in a way that module functions will be run by those threads and access
Lisp objects, please describe such a use case, so we all are on the
same page regarding the situations we are considering.

Such a use case is what these assertions should guard against. Calling environment functions from non-Lisp threads is undefined behavior, and these assertions are meant to protect module developers against them.

> > > To be forward-compatible, either we have to formally document this and
> > then
> > > never ever change the implementation, or document that module authors
> > can't rely on it and have to perform
> > > synchronization themselves.
> >
> > Since it is documented already, module authors can rely on that.
>
> It's not documented because it can't be documented without talking about
> the behavior of the C module API, and that is itself not documented yet.

If what you mean is that we don't say explicitly in the doc that only
one Lisp thread can be active at any given time, we can add that
statement right now. Would that be enough to settle this particular
argument.

There is no actual argument :-)

It's just a matter of how much documentation we need. I think the guaranteed synchronization should be documented in the to-be-written modules manual as well, if we choose to make the guarantee.

> If
> we choose to guarantee that unsynchronized accesses to module functions
> don't introduce data races, then we need to say that explicitly in the
> module documentation, and stick to it. (I.e. once that decision has been
> made, there's no going back.)

I think that ship sailed long ago: the current implementation assumes
implicitly and explicitly that at most only one Lisp thread runs at
any given time. If we will ever want to have an implementation that
violates this constraint, we will have to completely rewrite
everything that is related to threads, including emacs-modules.c.

Yes, but if we do decide to rewrite, then the new code might need additional synchronization if we guarantee thread safety which would otherwise not be required.

> > > I meant this as a general statement: if an API is more restrictive (i.e.
> > has stronger preconditions), its
> > > implementation can be more flexible because it has to deal with fewer
> > corner cases.
> >
> > I don't think it's right for us to favor our flexibility over that of
> > the Lisp programmers who will want to use these features.
>
> That's a matter for debate.

I'm saying that as long as this is debatable, we shouldn't make life
for Lisp programmers so much harder based on assumptions that are not
universally approved by the project as a whole. My opinion is that
the Lisp programmers' flexibility should be preferable to that of
developers.

The two options are:

1. Guarantee thread safety of module objects.

2. Don't guarantee thread safety of module objects.

You clearly advocate for (1), which is fine. What I'm trying to point out is that if we picked (2) now, we can easily switch to (1) later, but not the other way round. Once we decide to do (1), there's no way to go to (2), even if in some future implementation (2) might become simpler or more efficient. That's the tradeoff.

> > I think we only care that module functions run in the context of some
> > Lisp thread, and we don't care which Lisp thread is that. So I
> > proposed a simplified implementation of the test, which I hope you
> > will agree with.
>
> If we want to guarantee that environment functions can be called from
> arbitrary threads without introducing data races, then the only assertion
> that's necessary is whether (current_thread->thread_id ==
> GetCurrentThreadId ()); without that undefined behavior would be almost
> guaranteed.

Sorry, I don't understand what you are saying here. Especially since
your proposed condition is almost the one I proposed in

http://lists.gnu.org/archive/html/emacs-devel/2017-04/msg00720.html

Does this mean that you agree with my reasoning and the code I
proposed there?

Yes, almost, if we pick option (1) now.

Once again: there _are_ legitimate situations in Emacs when for a
short time current_thread is NULL. If you assume that these
situations don't happen, your code will sooner or later crash and
burn. I'm saying this because I once thought current_thread should be
non-NULL at all times, and my code which assumed that did crash.

I've reviewed the threads code, and I'm quite sure that current_thread can never be NULL during check_thread. current_thread is only NULL between exiting a Lisp thread and resuming execution in another thread after reacquiring the GIL. The evaluator doesn't run during that phase, and therefore no module functions can run.

current_thread could only be NULL during check_thread if called from a thread that is not a Lisp thread (i.e. an OS thread created by the module). That's exactly one of the undefined behavior situations that the assertions are meant to prevent.

From:	Philipp Stephani
Subject:	Re: [PATCH] Support threads in modules
Date:	Sat, 10 Jun 2017 19:58:40 +0000