[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Current master non-deterministic

From: David Kastrup
Subject: Re: Current master non-deterministic
Date: Sun, 16 Sep 2012 10:06:41 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2.50 (gnu/linux)

Werner LEMBERG <address@hidden> writes:

>> It would appear that the main application is the removal of
>> duplicates, with the idiom
>> sort
>> uniq
>> in some form or other.  This can be replaced by
>> make map/hash of pointers
>> iterate through list
>>   if pointer in hash, delete list element, else put pointer in hash
>> in order to a "stable" O(n lg n) uniq for which the structure of the
>> final list does not depend on the memory order of the original
>> elements.
> This certainly sounds like a better solution.

However, rethinking this, it seems like a "solution" for not seeing the
symptoms of a problem: we stabilize results artificially.

Mike saw significant fluctuations of his results here depending on the
memory position of data, presumably because of a wrong order of
operations in the followup.  The "better solution" I propose above will
lead to the identical wrong order of operations from run to run.

Now the best that might have happened is that the memory address merely
served as a tie-breaker for two equally good solutions.  In that case,
it may make sense to apply a tie-breaker that is less arbitrary.

"It was first in the list before we started", however, is a hardly less
arbitrary tie-breaker.  It is merely deterministic, but in no other
respect natural.

So I'd prefer if we kept the more efficient variant.  It turns out that
weeding out duplicates _and_ post-sorting according to some arbitrary
criterion can be done in a single pass if you use the memory address
comparison as a tie-breaker for the post-sorting (assuming that the
post-sorting will always think x == x).  However, comparing memory
addresses is cheap, and the post-sorting may have a more expensive
sorting criterion, so weeding out duplicates separately (via the memory
address thingy) before starting with more expensive sorting criteria
still makes good sense, as then the expensive sorting criterion is
needed less often.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]