guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Locks and threads


From: Neil Jerram
Subject: Re: Locks and threads
Date: Thu, 26 Mar 2009 22:01:26 +0000
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)

address@hidden (Ludovic Courtès) writes:

> Hi Neil,

Hi Ludovic,

Many thanks for your detailed comments!

> Neil Jerram <address@hidden> writes:
>
>> - it runs for a duration specified by environment variable
>>   GUILE_TEST_DEFINE_RACE_DURATION
>>
>> - it defaults to just 10s if that variable isn't defined.
>
> Even 10s is going to be slightly annoying.  Another option would be to
> create tens of guile-definer threads instead of just 4; it might trigger
> the problem earlier.

That's an interesting idea, but to be honest I'd prefer to finish this
work and move onto something else.  Would it be OK just to reduce the
default runtime to 2 seconds?  (It wouldn't make any practical
difference, because 10 seconds isn't normally enough to see a define
race anyway.  It'll just be enough to catch any obvious bitrotting,
and people who want to run a real test can use
GUILE_TEST_DEFINE_RACE_DURATION.)

>> #2 makes the symbols hash thread-safe, and it appears that this
>> completely fixes the define-race problem.  I don't understand why we
>> apparently don't need patch #3 as well - but that's what my results
>> indicate.
>
> Roughly #2 is a specialized version of #3 as it addresses only the case
> of the symbol hash table.  So it should be enough for this very problem.

Actually #2 and #3 are a bit different (and probably I should have
explained more about this in my previous email).

The key points about #2 are that it uses a straight pthreads mutex to
protect the symbol hash, and that the symbol hash is a weak hash
table.  Using a pthreads mutex means that we have to take care not to
do any allocations (or anything else that could block) while the mutex
is locked.  The weakness means that as well as the obvious lookup and
intern operations, we have to allow for the hash table being accessed
and resized from an after-GC hook.

#3, on the other hand, uses a fat mutex, and can be applied to any
non-weak hash table.  Using a fat mutex means that it's fine to
allocate or block while the mutex is locked, and means that the code
can use scm_dynwind_lock_mutex, which is quite neat.  On the other
hand, it would cause a problem if the hash table concerned could be
accessed during GC, because

- if the GC didn't try to lock the fat mutex itself, it would be
  modifying the hash table under the feet of the thread that has the
  mutex locked

- if the GC does try to lock the fat mutex, it won't be able to and so
  will block... deadlock!

Hence the restriction of #3 to non-weak hash tables.

> #3 is probably the way to go in the longer run, but it changes the
> API/ABI so it can't go in 1.8.

If #3 was needed, I wouldn't see any problem with its API/ABI changes.
The API change is addition of one new primitive, and the ABI change is
to a structure that isn't intended for user code to access.  Am I
missing something?

On the other hand, if it isn't needed - as appears to be true -
there's no case for adding it to 1.8.

>> * test-suite/standalone/test-define-race.c: New test.
>
> Could you run `indent' on this file?

Sure, will do.

>> +TESTS += test-define-race
>
> It needs to be conditionalized on `--with-threads'.

It's already inside an "if BUILD_PTHREAD_SUPPORT".  Do I need to do
more than that?

>>  scm_i_rehash (SCM table,
>>            unsigned long (*hash_fn)(),
>>            void *closure,
>> -          const char* func_name)
>> +          const char* func_name,
>> +          scm_i_pthread_mutex_t *mutex)
>
> This function assumes that MUTEX is locked.  I would either write it
> prominently in a comment above or change the semantics so that MUTEX is
> acquired in `scm_i_rehash ()', not in the caller.

I think I prefer the prominent comment.  My first thought was for
scm_i_rehash () to acquire the mutex.  I changed to the current code
because then all of the lock..unlock pairs are in symbols.c - which
seemed more consistent to me.

> The latter might be achieved by integrating tests about whether TABLE
> needs to be rehashed into `scm_i_rehash ()' itself.  I.e., this:
>
>   if (SCM_HASHTABLE_N_ITEMS (table) < SCM_HASHTABLE_LOWER (table)
>       || SCM_HASHTABLE_N_ITEMS (table) > SCM_HASHTABLE_UPPER (table))
>     scm_i_rehash (...);
>
> would become:
>
>   scm_i_rehash (...);
>
> I haven't reviewed all uses to see whether it's actually possible and
> whether it would lead to race conditions, though.

I think that could work, but it would add another lock/unlock to the
mainline.  (We need to hold the lock even just to evaluate the
rehashing condition.)  The change doesn't feel compelling enough to me
to justify that.

>> -static SCM symbols;
>> +SCM scm_i_symbols;
>
> I don't think this is needed, is it?

Yes, so that rehash_after_gc () (in hashtab.c) can identify the symbol
hash and call scm_i_rehash_symbols_after_gc () for it.

>> +static SCM
>> +intern_symbol (SCM symbol, const char *name, size_t len, size_t raw_hash)
>
> Since all users of `intern_symbol ()' perform a
> `lookup_interned_symbol ()' right before, I'd rather change users so
> that they lock before `lookup' and unlock after `intern':
>
>   lock (&symbols_mutex);
>
>   sym = lookup_interned_symbol (name);
>   if (scm_is_false (sym))
>     {
>       sym = scm_i_c_make_symbol (...);
>       intern_symbol (sym);
>     }
>
>   unlock (&symbols_mutex);
>
> Is it doable?

The problem with this is allocation (scm_i_c_make_symbol) while
holding the mutex.  The overall arrangement of the symbol code is

  lookup - allocate - intern

and I assume that is so as to avoid allocation in the case where the
symbol is already interned; otherwise the arrangement could be

  allocate - lookup/intern

with the lookup/intern being implemented as a single operation.

I've made my changes so as to leave this overall pattern the same.
The code certainly could be simpler if we always allocated upfront,
but I think it is better (for performance) not to do that.

On the other hand, I suspect the code could still be simplified to
remove the duplication between lookup_interned_symbol and
intern_symbol.  I'll have a look at that.

Thanks again for the review!

Regards,
        Neil




reply via email to

[Prev in Thread] Current Thread [Next in Thread]