guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: string-map arg order


From: Dirk Herrmann
Subject: Re: string-map arg order
Date: Thu, 6 Sep 2001 23:06:21 +0200 (MEST)

On 6 Sep 2001, Alex Shinn wrote:

> >>>>> "Dirk" == Dirk Herrmann <address@hidden> writes:
> 
>     Dirk> No, you have to re-calculate the character position in every
>     Dirk> iteration
> 
> Do you?  I was originally thinking in the context of multi-threading -
> it seems preferable that a procedure like string-for-each work on an
> static version of the string so you don't have to worry about
> synchronization issues with other threads.  But here without threads
> you're mutating the object that you're looping over inside a loop.  Is
> this something that has to be defined?

I don't think that changing the string semantics is an option we should
consider.  IMO, guile should follow RnRS.

Given that, do you see my point that iteration over strings (even with
functions like string-for-each) is O(n*n) for variable character length
encodings, independent of whether you use threading or not?  Or am I wrong
with that?  We should, before we make a decision about one thing or
another, first be aware of the consequences!  It remains still up to us
which decision we will finally take, but it makes no sense to look away
from inconvenient facts.  I am well aware that both representations have
their pros and contras.

> If so, then the string contents object won't work with fixed-width
> representations either.  An external C function will only be looking
> at the char*, not using our accessors macros, and therefore wouldn't
> pick up changes in the string from other threads.  Whether we have to
> re-allocate to make room for wider utf8 chars, or re-allocate to
> upgrade from 1 to 2 or 4 byte chars makes no difference to an external
> library.

I don't claim that anything doesn't work with either representation, or,
more explicitly:  You can do the same things with both representations.  
The difference is, that with variable character length encodings there is
no such thing as an O(n) iteration over a string, except you _know_ about
everything that happens during the iteration.  If there is something you
don't know, (like calling a function from within the iteration which may
modify your string, or like having another thread running which also might
modify your string) you have to re-calculate the character's position in
every iteration.  The point is that such a recalculation is O(1) for
fixed width character encodings and it is O(n) for variable character
length encodings.

Now, to your statement that 'string contents object won't work with
fixed-width representations either':  It will work with both
representations!  If your C function will only perform read-operations,
then, sure, you may miss some changes in the string from other threads.  
So what?  There is no guarantee about when which thread will be executed.  
The behaviour that the C function misses some changes is exactly the
behaviour you would encounter if the thread of the C function runs long
enough before the threads that perform the changes are executed.  If that
is a problem for your specific application, then you have to use mutexes.  
Which is also the way to go if the C function would change the string:  
You would have to use a mutex to inhibit conflicts with other threads that
are running.

Again:  Both string representations can be used to perform the same kind
of tasks.  Fixed width encodings can use up more memory and may need
additional effort converting from and to them (given that the rest of the
world uses a different format).  Iteration and a lot of other operations
can, however, be performed in a comparably efficient way.  Variable width
encodings can be more space efficient, but many (and many very common)
operations are worse (by a complexity factor of O(n)) with respect to
execution time.  You could, however, save some conversion overhead if the
rest of the world uses the same format.

These are our options.  Better we know about them before we make
decisions.

Best regards
Dirk Herrmann





reply via email to

[Prev in Thread] Current Thread [Next in Thread]