[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Help-glpk] Optimization and Multicore GLPK
[Help-glpk] Optimization and Multicore GLPK
Mon, 24 Dec 2012 17:37:22 -0800 (PST)
Here are a few comments on optimization and multicore GLPK. A search online
for information about optimizing GLPK only turned up xypron's recent post about
enabling SSE and fastmath. However, GLPK is very sophisticated, so I what I
have to say may fall in the "not documented yet" category. However, except for
Maros, I've seen little mention of performance profiling and coverage analysis
in the LP literature I've read. Most of the performance metrics are limited to
iteration counts and elapsed time. Serious code optimization requires much
finer granularity of analysis.
I have just completed an initial, very quick read of "Computational Techniques
of the Simplex Method" by Istvan Maros. For anyone interested in working on LP
codes, I think it is a necessary, but not sufficient reference. Maros gives a
good overview of many issues related to implementing a simplex code and to
large scale solvers in general.
It was typeset by the author so there are typos and non-native speaker errors
to contend with. However, all the important ideas are accompanied by numerous
citations, so with the addition of the appropriate references, it appears to
provide a good map of the terrain. I qualify my statement only because I am
far too ignorant of LP to make a stronger statement.
I agree entirely with Andrew's comments previously on the topic of
parallelization of GLPK. There is much work to be done on the serial algorithms
before it makes any sense to attempt to implement parallel execution. Aside
from the computational cost of pricing, the particular choice of pricing rule
can strongly influence solution times for particular problems. So implementing
Devex and other rules seems to me a good starting point. I'm sure that other
opportunities to improve single core performance exist, but this seems a
particularly obvious place to start.
That said, barrier synchronized shared memory parallel programming should be
suitable for use in GLPK running on multicore workstations. Actual performance
is highly dependent upon cache architecture and organization, so the benefit
cannot be predicted easily. Considerable experimentation and deep knowledge of
the hardware behavior is required to make this work well. The only reason it
is worth considering at all is that a single architecture dominates the
computing landscape. Intel is unlikely to do something that breaks a lot of
major codes and core counts are likely to continue to grow.
The concept is simple. In sections which are not execution order dependent, a
small number of coroutines are started by a signal. These coroutines then run
to completion at which time they send a signal to the main routine which
decrements a counter. When all the coroutines have completed, the counter
decrements to zero and the main routine continues. If one can avoid thrashing
due to cache collisions, those portions of the code can be speeded up by the
number of cores employed. Of course, total speedup will be much less as
dictated by Amdahl's law.
In looking at some of the source code for simplex, I see places where there is
a loop over vector operations. The natural thing to do for these is to use the
SSE instructions to speed up the vector operations and spread the vectors over
Though conceptually simple, none of this is easy to do. In addition, GLPK is
easily the most complex code I've ever looked at in 20+ years of working on
several million lines of scientific codes. Fortunately, it's also hands down
the best written large code I've ever seen which is a real delight.
Any optimization work on a code as sophisticated as GLPK is a major undertaking
which will take a long time to execute. The first step is profiling and
If there is sufficient interest in this subject, I propose to implement and
document performance profiling and coverage analysis of GLPK in the wikibook
using the various Netlib problem sets. This will be for convenience restricted
to the *nix environment. However, it should generally be applicable to Windows
if a *nix environment package such as Cygwin is used.
I am particularly interested in comments from Andrew, Marc, Robbie and xypron.
All have been very generous to me in different ways and this is an attempt to
repay them. I come from a seismic processing background where run times are
measured in tens of thousands of hours for a single dataset. Fortunately, most
seismic codes are trivially parallel. So a few hundred quad core nodes and a
couple of weeks will get the job done. Were that not the case, oil would be a
lot more expensive than it is. But we still do a lot of work to make the codes
- [Help-glpk] Optimization and Multicore GLPK,
Reginald Beardsley <=