[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnucap-devel] orderings
Re: [Gnucap-devel] orderings
Fri, 9 Nov 2018 18:33:49 -0500
On Fri, 9 Nov 2018 12:51:19 +0100
Felix Salfelder <address@hidden> wrote:
> > a good ordering could be computed (pretty quickly) from the incidence
> > graph and a loaded matrix. but if it is done only after load, then the
> > matrix rebuild will take some extra time.
> did something in that direction.
> some ordering is pluggable in the plug_order branch. with this, a
> static odering can be computed from the incidence graph.
> when the incidence graph is not constructed (forward, reverse), there is
> some overhead in the virtual iwant calls. i can't measure it. (but
> that's it).
The run time overhead of a virtual call is that of a real function
call, which means if it was going to be a real function call anyway the
virtual overhead is insignificant. It becomes significant when it
would have been inlined otherwise, and the function is so small that
the code inside takes less time than a function call. Also, if you
compile with "make debug", that's "-O0" (oh zero) which means no
optimization, which means always real function calls. To measure
speed, you need to compile with full optimization and debug off.
Also, with small circuits this doesn't matter. It's only when the
circuits get large that it matters. The tests in gnucap/tests are
intended as correctness tests, not benchmarks. They are all too small
to be meaningful as benchmarks, and must be that way so the test suite
runs in a reasonable time.
Back when I did the initial work, with the large circuits being
considered, Spice run time would be measured in days. Having the same
benchmark run in hours on what was to become gnucap was quite an
> i wrote plugins that intercept and store the incidence graph, then pass
> it to algorithms shipped with boost. i have uploaded test diffs for
> reverse cuthill mckee . perhaps interesting
> - the density gets better in most cases, eq*fg2.ckt is the notable
> exception, all others are not too bad.
These examples are too small to be significant, but what I see is maybe
50-50 in terms of density getting better.
On eq* ... Reordering to overlap blocks improves the density number,
but hurts incmode. Since this circuit is all linear, incmode is all or
nothing, so you probably won't see the difference in this example.
> - the extra running time is still negligible, even without much
> tweaking. solving time drops a bit, perhaps compensating for order
> - the numerical quality increases. it's not visible in many places, but
> some numerical zeroes are closer to zero, and sometimes the number of
> iterations goes down. it never goes up. my incmode example also seems
> to work with rcmk.
I am not sure what that means. What I see is arbitrary small
differences that seem to be related to rounding errors. Which is
really better could go either way. I can't tell without further study.
It looks to me that the incmode example still doesn't work.
> have fun.
I think you will eventually conclude as I did back in 1985, and again
back in 1992. ... that the fancy ordering algorithms really don't
work as well as the simple ones.
The existing ordering algorithm is not nothing at all. It's a "nearest
neighbor" algorithm, on read-in. When you need to create a new node,
just take the next one in line. Then reverse the whole thing.
Considering how netlists are built, this actually results in a decent
order, keeping nearby nodes nearby. It is not guaranteed to be optimal,
actually not guaranteed to be anything at all, but it usually
outperformed the common published algorithms at the time, especially
when considering the run time of the ordering algorithm.
To improve the algorithm, I think the best approach is to make one more
pass to sort nodes within a block by degree, and leave it at that. You
could do multiple passes, but I think even one fixed pass would do.
Even that should be done before expansion.
Suppose we have a degree array for a block something like this:
21432.33 ... 5 internal nodes, two connection nodes.
The two connection nodes cannot be moved, but the internal nodes can
be. The ideal order would have a degree array: 12234.33.
So, if the numbers were 12345.67, the updated order would be 21543.67.
I think this is as ideal as you can get.
Try a bubble sort ...
12324.33 .. one pass, not optimal but better.
12234.33 .. second pass, now optimal in this case.
I really don't think you can do better than this.
(Yes I know .. it's a challenge to try).
Now .. how does all this relate to incmode? The sort I just proposed
improves it, and that should be the highest priority.
To further improve the algorithm, note how nonlinear the various
connections are, and take the most linear ones first.
Incmode isn't just incremental matrix update. It enables the
possibility of not solving the whole matrix, and not processing some
Of the test circuits provided the eq* are the only ones that are
complex enough to be at all significant. Then you need some bigger
circuits that might take hours or days to run.