[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: string-map arg order

From: Dirk Herrmann
Subject: Re: string-map arg order
Date: Wed, 5 Sep 2001 00:58:16 +0200 (MEST)

On 4 Sep 2001, Gary Houston wrote:

> > From: Dirk Herrmann <address@hidden>
> > Date: Mon, 3 Sep 2001 22:09:45 +0200 (MEST)
> > 
> > However, I fully agree with you about the problems of multiple-width
> > encodings:  In case of multiple threads, every access to one of a string's
> > characters needs to recompute the memory location of that character,
> > because some other thread might have changed the string and even replaced
> > some characters of different encoding widths.
> > 
> > I have difficulties to believe that variable with encodings could
> > practically work in the context of multithreading at all.
> With the posix-style multithreading I don't think even the simple
> strings we have now could be safely modified from multiple threads
> without the user taking precautions such as use of mutexes.  And it
> probably wouldn't be feasible to build the thread safety into every
> string by default because performance would become abysmal (or is it
> only when you start getting serious and flush each processor memory
> cache whenever it may be needed, that performance becomes abysmal?)

We have to distinguish two levels of thread-safety:  Library level and
user level.  If a user executes two different threads that access and
potentially modify the same string (using the current simple string
representation), it is possible to get wrong results if the two threads
don't communicate properly via mutexes.  However, this is a problem the
user can solve and it won't crash the library.

In contrast, the function string-ref is an atomic function from the user's
perspective.  That means, the handling of strings within string-ref has to
be 'bulletproof' with respect to threading.  Now, assume some string
representation where the character's width or the position of the
characters in memory could change if a string gets mutated.  (This is not 
the case with guile's current strings, but would be with variable width
characters as well as with fixed width chars if the fixed width starts
with one byte by default and gets potentially increased.)  That means,
that _within_ the function string_ref precautions have to be taken to deal
with the fact that the positoin of the characters may change.  Example:

  scm_string_ref (SCM str, SCM idx)
    char *c = SCM_STRING_CHARS (str);
    char *p = go_to_position_idx (c, idx);
    char ret = *p;                          /* dangerous */
    return SCM_MAKE_CHAR (ret);

Independent of how go_to_position_idx is implemented (a simple computation
with a fixed width encoding, linear search with a variable width
encoding), with preemptive threading, a preemption and the execution of
a second thread could happen immediately after 'p' has been computed.  If
the other thread changes the string 'str' in a way that requires to move
it to a different memory region, the access to '*p' is dangerous, because
the old memory region may already be freed when the first thread regains
control.  This problem has to be fixed within scm_string_ref, because we
can't rely on the user to do the necessary locking.

BTW:  There are solutions to this kind of problem which don't require to
use a mutex with every access to a string:  You just have to make sure
that the old memory region remains alive.  Then, the access to '*p' will
never go into freed memory.  In this case, however, the data that is found
at '*p' may be 'outdated', but since there are no guarantees to the order
in which threads get executed, it can be acceptable that '*p' reads the
old content - it would have done so anyway if the preemption had happened
a couple of microseconds later...  Some time ago I have explained such a
concept in the context of implementing implicitly shared substrings:

Best regards
Dirk Herrmann

reply via email to

[Prev in Thread] Current Thread [Next in Thread]