[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: thread safe functions
Re: thread safe functions
Sat, 28 Aug 2010 21:33:25 -0400
On Aug 28, 2010, at 15:20, Andy Wingo wrote:
>> Unfortunately this applies to some internals of the implementation too.
>> For example, "set-object-property!" and friends use a hash table and
>> assoc lists internally.
> Fixed, thanks.
>> scm_c_issue_deprecation_warning and scm_c_register_extension don't look
>> thread-safe to me (in the sense described above).
> Fixed also.
Though... looking at deprecation.c, I'd suggest not printing stuff while the
lock is held; just keep track of whether it needs to be printed, and do the
printing after releasing it. Partly because locks should be held only briefly,
and printing can block, but also if the error port can call back into other
Scheme code, which may call a deprecated function, then you could get an error
trying to acquire the lock (which we won't notice, because we never check), or
I was starting to look through the port code for this, but got distracted by
scm_ptobs, a pointer which appears to be dynamically reallocated as needed when
scm_make_port_type is called. Which means every function that reads it should
be locking the same mutex used when updating it; in this case, that appears to
be the generic "critical section" stuff, and it looks like we're not doing that.
So, hmm, where else do we use hash tables internally?
foreign.c: pointer_weak_refs is a weakly keyed hash table, which we add things
to without locking.
gc.c: scm_protects is protected; good.
instructions.c: Can scm_lookup_instruction_by_name be called the "first" time
from two threads concurrently? It uses a hash table stored in a static
variable, and the variable is updated before the hash table is filled in.
(Even if you simply assign to the variable afterwards, there's no requirement
that the compiler or CPU not reorder those assignments.) But, as an aside, do
we need all that in C, or could instruction-list just export a big Scheme list
that could be examined (or turned into a hash table) by Scheme versions of the
keywords.c: A critical section protects keyword_obarray; good.
modules.c: scm_pre_modules_obarray doesn't look protected.
ports.c: scm_i_port_weak_hash is protected in some places, others I haven't
investigated further. But in scm_c_port_for_each, it looks to me as though, if
ports are being created while the function is setting up, you could end up
raising an exception if scm_internal_hash_fold iterates over more ports than
were in the hash table when the count was initially taken (and the vector size
procproc.c: There's a mutex to protect overrides, but it looks like
set-procedure-property! doesn't use it correctly; it should look more like
properties.c: I'm not familiar with this code at all, but I'd guess
properties_whash should be protected. Though I wonder if the call-out from
primitive-property-ref could invoke other primitive-property functions, causing
a deadlock if it isn't done carefully.
smob.c: I don't think tramp_weak_map is adequately protected, but I'm not
srcprop.c: scm_source_whash isn't protected; it also appears to be exposed by
the name "source-whash" to Scheme code, which wouldn't be able to use a mutex
defined and used purely in the C code.
(What's that, four different kinds of property-list mappings, all implemented
struct.c: Even calling struct-vtable-name can cause a hash table entry to be
created, it appears. So that's not thread-safe, never mind the call to
actually change the name.
symbols.c: I don't think 'symbols' is handled safely. But this code is all
starting to run together in my mind. :-)
weaks.c: Is only making things for Scheme code, where we're making locking the
It would be a bit easier if hash tables themselves were thread-safe. A
read/write lock might make more sense than a simple mutex though, if reading is
more common than writing. In my experience they sometimes perform worse than
simple mutexes, so they're a win mainly for concurrent reading.
Of course, hash tables aren't the only places where we can have problems, and
I'm not even touching on hash tables that might be created by Scheme code (but
exposed to users with APIs that don't scream "unsafe hash table used here!"),
but I've already spent my afternoon and evening on this.
>> That's just from a spot check; I'm not even trying to be thorough.
> I await your more thorough check :)
Yeah, in my copious free time... :-( Maybe some more next weekend, if I'm not
doing the stuff I was supposed to get done last weekend.
>> And don't get me going on memory ordering models....
> I'm interested!
There's a lot to read out there, and I'm no expert, but the main idea is,
depending on the specific architecture, memory loads and stores can be almost
arbitrarily reordered between CPUs unless you do something to prevent it.
So, if you do something like:
(preconditions: z is a pair visible to both threads; SCM_CAR(z) is a pair)
allocate cons cell x
Assuming that reads and writes of SCM values are atomic, thread 2 will read t
as either the old value, or x. If it's x, though, SCM_CAR(x) might *not* be
read back as v1, but whatever garbage was in there when (or before) the cell
was allocated. And instead of the CAR slot could also be some bits in a
non-immediate value that indicate a subtype, causing the reading thread to call
the wrong routines to manipulate it.
Basically, any time you make an object handle visible to other threads, you
need to make sure the object's contents have been stored first, and the other
processor hasn't loaded that location yet -- or, if it might be cached, it
knows to flush that entry. (Multiple threads can run on one processor, or
multiple cores sharing a cache, in which case it's probably not a problem. A
thread can also be moved between cores, but the OS is responsible for keeping
things in sync in that case.) Changing an object already visible to other
threads probably requires at least as much caution, but given that malloc/free
are likely to reuse storage, it's not even reasonable to assume that a location
in a freshly allocated cell isn't cached by some other processor from back when
it was part of some other object, so, I think actually the two problems are
largely equivalent; a visible live object might merely be more likely to be in
According to the Wikipedia article on memory ordering, the Alpha has (some of
them are still out there) a really interesting property: reordering dependent
loads, i.e., loading data before loading the pointer to the data. It isn't
described in detail, but I assume it's related to speculative loading or
Mutexes encapsulate "something to prevent it" pretty well; hacks like
"volatile" don't, though they can get the compiler somewhat out of the way.
Of course, adding a mutex to every non-immediate value is far too expensive in
memory use, even if the almost complete lack of contention means the locking
can probably be done with few context switches into the kernel. Using one
global mutex can work, but it basically means only one thread can be examining
or manipulating Guile objects at a time. For I/O or DNS lookups, that's
probably fine, but we do want to be able to manipulate data structures
There are lower-level facilities generally available, like memory-barrier
instructions and atomic operations (usually compare-and-swap, or
load-link/store-conditional), which could probably be used more efficiently;
there are even lock-free algorithms for doing some simple data structures
safely using these primitives. But the details vary, and doing more
synchronization than you're actually required to can hurt performance, so you
really need to tune it well. This is the sort of thing that kernel and
C-library developers sometimes spend a lot of effort trying to get right, so
that the kernel and libc facilities (e.g., mutexes) will be both correct and
To make it worse, the compiler can reorder memory access instructions if it can
show there aren't aliasing problems (i.e., that the locations being accessed
cannot be the same assuming the source code is following the rules of the
language -- and that last clause may let a compiler reorder accesses via "int"
and "double" types that happen to map to the same location!), and it can even
write (or not write) intermediate values to an output location.
So, for example, in:
register SCM x = ...; // so it can't be aliased by memory accesses
the compiler could change the order of the writes, since it knows the locations
are different (because there's a fixed nonzero offset between them), and under
the specs of the C language there's no way to stop in between the two
statements and examine the values stored, so the two versions are the same
under the "as if" rule that says only the overall side effects matter. The
original order will often be retained to help with debugging, because no
optimization benefit was found in reordering, etc., but you can't make your
program's semantics depend on that.
register int a, b, c;
n = foo() + a*b*c; // store to mem
could -- at least theoretically -- be implemented as:
n = foo();
tmp = a*b;
tmp *= c;
n += tmp;
though offhand I'm having a hard time thinking of a realistic case in which a
compiler would be likely to do that.
This is where "volatile" would help a bit. But simply making all non-immediate
objects volatile would probably be overkill.
Now, how *much* is this a problem for user-mode programs in practice? I don't
know. I've read a few papers, but haven't actually worked on code at this
level or researched the architecture details of many processors and memory
subsystems. (The multiprocessing stuff I've done all used mutexes, if not
always very efficiently.) A couple of web pages I've run across have suggested
that some published processor errata have pointed out memory ordering problems,
so the actual requirements for a given processor family may be stricter than
advertised by the architecture manuals. And sometimes updates to the
architecture manuals can change these specs. I think we'd need someone with
more low-level experience to help answer that. I suspect many programs can run
just fine for a long time without race conditions in them actually causing
visible problems; then, the next day, maybe they'll crash.
I recommend the "Memory ordering" page at wikipedia as a good starting point
for reading up on it, and the first couple of references at least. Also
Boehm's paper "Threads Cannot be Implemented as a Library" is an interesting
read about the compiler interactions, though if we're trying to bypass having
to use locks altogether, the problems we face are probably larger than the ones
> I think we agree, but I prefer to paint this in a more optimistic
> light -- that things are mostly there, but bugs are possible. Bug fixes
> are also possible :)
You might guess from the above that I don't think they're mostly there. That
bugs are possible, yes, that I'll grant you. ;-)
A real review should check:
* any non-const static-storage variables defined in the library, and how
* any functions modifying objects, to determine the effect of concurrent calls;
* any uses of those functions or objects from our C or Scheme code, to see if
we're presenting the correct semantics in the face of concurrent calls;
* that we actually specify the semantics, or when results are undefined,
* anything dealing with OS interfaces (or other library APIs) that aren't
And if those aren't going to take long enough already:
* getting *real* answers to the memory ordering issues;
* possibly reviewing every function manipulating memory for ordering issues;
* probably more.
Then, I might have some confidence that "things are mostly there." I don't
think it's going to happen at all for 2.0, and I don't expect to have time
myself to do anything more than little bits of work on the first part of the
list, if even that much, in the near future.
My work today didn't do any of this; I just started with looking over the
patches you checked in, and then ran "grep hash_ta *.[ch]" in libguile, to see
where else we've got hash tables, since we've already seen that they're a
problem in at least one other place. But even with just that, I found quite a
few things that I think need to be looked at.