swarm-support
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

storage leak and performance bug in schedule merging


From: Roger M. Burkhart
Subject: storage leak and performance bug in schedule merging
Date: Tue, 12 Aug 1997 08:22:51 -0500

As a result of tracing down Rob Kewley's report of speed problems using
dynamic schedules, I have uncovered a serious problem in the merging of
multiple schedules within a swarm.  The problem is both a storage leak
and a performance bug that results in slower and slower rescheduling of
the next pending schedule within a swarm.  A one-line source library fix
is included below.

The problem is that whenever multiple schedules are merged at the same
time value, an internal action is created to execute a concurrent group at
that time value, but is never cleaned up.  The problem occurs only when
multiple schedules are being merged in the same swarm, at any level
including the typical usage at an observer swarm level.  Not only is the
concurrent action not freed as long as the swarm still exists, but it
slows down every subsequent insert into the merge schedule of the swarm.
These new inserts occur every time the swarm switches from one pending
schedule to another.

The problem is not only the storage that is built up by these uncleaned
actions, but the slow-down effect they have on all subsequent inserts.
The merge schedule is part of the internal machinery of the swarm by which
it coordinates all running subschedules.  Ordinarily, it should have just
one action per subschedule, which are continually reshuffled according to
their next pending times.  Actions are inserted into the sorted merge
schedule using a linear search, which given the small number of actions
and extremely dynamic update, actually provides the best default sorting
algorithm for this specialized usage.  The problem is that any left-over
actions at an earlier time continue to delay the insert of actions at any
new pending time.  The insert time goes up linearly with the number of
left-over actions, which goes up by one whenever two subschedules are
scheduled to run at the same time.  The internal merge schedule continues
to grow until insert times start grinding down the speed to slower and
slower, just as Rob Kewley observed.  (The slow down does, however, have
the side effect of limiting the rate of storage leakage).

The fix is very simple (patch line below), but the effect is very
unfortunate.  Functionally, swarms have been executing actions in the
proper order, but if you were using one of their major functions, to merge
subschedules, you were probably seeing the slow-down effect.  Unless you
merge your own schedules inside your own swarm, you may not have seen this
effect in your own model swarm, but I have confirmed that the effect does
occur even in a display swarm merging display actions with model actions.
In this case, it could result in the display speed getting slower as the
model runs, in proportion to the display frequency.  The fact that the
effect has not been more glaringly obvious up-to-now must indicate how
little the display merging actually consumes relative to other model and
display operations, even after many accumulated display cycles.

The fix is to insert the following source line after line 43 of the source
file activity.m in the activity library (either Swarm 1.0.1 or Swarm 1.0.2):

  [_activity_swarmSyncType setAutoDrop: 1];

Here's the fix in the form of a diff (usable for patching):

diff -r activity/activity.m activity.old/activity.m
44d43
<   [_activity_swarmSyncType setAutoDrop: 1];

The fix adds the AutoDrop option to the merge schedule, so that the
left-over concurrent actions are automatically dropped as soon as they
are completed.  This was a simple oversight, made possible by the fact that
the main actions of this internal schedule are *not* routinely dropped but
instead removed from the schedule and reinserted at another time every time
they are executed.  It's only when two are scheduled at the same time that
they leave behind garbage.  My storage leak tracing tests also failed to
uncover the problem because they were checking only that everything was
cleaned up after the swarm had completed.  Since the left-over actions
still belonged to the running swarm activity, they were automatically
cleaned up after the swarm had completed, leaving no trace. 

None of this seems like a very good excuse, but the one consolation is
that there may be better speed in current swarm scheduling than people
have actually been seeing.  A lot of intricate machinery was built in
to make sure that concurrent actions were dropped when the AutoDrop option
was set; it's just that the option hadn't been turned on for this special
internal case.  In all my new testing, I've continued to confirm that
the AutoDrop machinery for dynamic schedules is working correctly when
actually used.  I've also got a bug fix to pass on the AutoDrop option to
concurrent groups when it's set on an owning schedule (a previously
reported bug) that I'll pass along as soon as I give it a couple more
tests.  So overall, the library is doing what it needs to be; it's just
that with such little visibility as to what's in those executing
structures, it's too easy for something so egregious as these left-over
actions to be in there cluttering things up.  This experience has
reinforced in me the need for much better tools to inspect and visualize
the scheduling structure contents at any time.

If you'd like to confirm whether you've been the slow-down effect in your
own display swarm, the following function and scheduled action at some
interval in your display swarm should be able to track the number of
left-over actions:

  void printMergingScheduleCount( void )
  {
    printf( "count of actions in merge schedule: %d\n",
            [[getTopLevelActivity() getActionType] getCount];
  }

In buildActions of the observer swarm, schedule the following action to
occur at some repeating interval:

  [repeatingDisplaySchedule at: 0 createActionCall: printMergingScheduleCount];

To apply the fix, you'll have to remake the activity library, which can
only be done in a full source release and not in a binary release.
Obviously, this fix is a priority to get into some sort of upcoming
incremental release.  For anyone that makes the interim fix, I'll be
interested in reports of any significant differences you see.

My apologies for such a bug being present in the released code, but we'll
get it fixed as soon as possible.

Roger Burkhart

                  ==================================
   Swarm-Support is for discussion of the technical details of the day
   to day usage of Swarm.  For list administration needs (esp.
   [un]subscribing), please send a message to <address@hidden>
   with "help" in the body of the message.
                  ==================================


reply via email to

[Prev in Thread] Current Thread [Next in Thread]