[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Help-glpk] [Fwd: Re: Optimization and Multicore GLPK]
[Help-glpk] [Fwd: Re: Optimization and Multicore GLPK]
Tue, 25 Dec 2012 14:10:52 +0400
-------- Forwarded Message --------
From: Vladimir Dergachev <address@hidden>
To: glpk xypron <address@hidden>
Cc: Reginald Beardsley <address@hidden>, address@hidden
Subject: Re: [Help-glpk] Optimization and Multicore GLPK
Date: Tue, 25 Dec 2012 01:03:13 -0800 (PST)
On Tue, 25 Dec 2012, glpk xypron wrote:
> Hello Reg,
> == Profiling ==
> Profiling GLPK could give very valuable ideas how to speed up the code.
> Oprofile is a useful tool for doing the work. See
There is also "perf" which newer kernels support.
> == Parallelization ==
> Before thinking about parellization it would be necessary to attack the very
> same design decisions that make GLPK not thread safe: memory management and
> error handling.
In my experience, for compute-intensive codes, it is not necessary to make
memory management to be too nice to multithreaded code - a simple global
lock will suffice.
This is because a well optimized compute intensive code rarely (if ever)
calls memory management functions, as they are rather slow. For quick
snippets of memory, one can use alloca().
> When it comes to question whether parallization or optimiation of the single
> thread code is more valuable I must admit that I do not care as long as I can
> get back to the entry prompt faster. So if parallelization can give me factor
> 4 and a better algorithm factor 25, I would be happy to get both and solve
> my problem in 1 % of the time.
There is also a possibility of using SSE or newer AVX to achieve much
higher throughput. Also, upcoming Xeon PHI looks very attractive for
> There are different levels where parallelization might be considered for GLPK.
> One place is the simplex algorithm itself. An overview can be found in
> One direction of research not covered here are attempts to run the
> Simplex algorithm on GPUs.
> Another place for parallelization is the branch and bound algorithm as
> described in
> My impression is that parallelization of branch and bound in GLPK would
> significantly less programming resources.
> Best regards
> Heinrich Schuchardt
> -------- Original-Nachricht --------
>> Datum: Mon, 24 Dec 2012 17:37:22 -0800 (PST)
>> Betreff: [Help-glpk] Optimization and Multicore GLPK
>> Here are a few comments on optimization and multicore GLPK. A search
>> online for information about optimizing GLPK only turned up xypron's recent
>> post about enabling SSE and fastmath. However, GLPK is very sophisticated,
>> so I what I have to say may fall in the "not documented yet" category.
>> However, except for Maros, I've seen little mention of performance profiling
>> and coverage analysis in the LP literature I've read. Most of the
>> performance metrics are limited to iteration counts and elapsed time.
>> code optimization requires much finer granularity of analysis.
>> I have just completed an initial, very quick read of "Computational
>> Techniques of the Simplex Method" by Istvan Maros. For anyone interested in
>> working on LP codes, I think it is a necessary, but not sufficient reference.
>> Maros gives a good overview of many issues related to implementing a
>> simplex code and to large scale solvers in general.
>> It was typeset by the author so there are typos and non-native speaker
>> errors to contend with. However, all the important ideas are accompanied by
>> numerous citations, so with the addition of the appropriate references, it
>> appears to provide a good map of the terrain. I qualify my statement only
>> because I am far too ignorant of LP to make a stronger statement.
>> I agree entirely with Andrew's comments previously on the topic of
>> parallelization of GLPK. There is much work to be done on the serial
>> before it makes any sense to attempt to implement parallel execution. Aside
>> from the computational cost of pricing, the particular choice of pricing
>> rule can strongly influence solution times for particular problems. So
>> implementing Devex and other rules seems to me a good starting point. I'm
>> sure that other opportunities to improve single core performance exist, but
>> this seems a particularly obvious place to start.
>> That said, barrier synchronized shared memory parallel programming should
>> be suitable for use in GLPK running on multicore workstations. Actual
>> performance is highly dependent upon cache architecture and organization, so
>> benefit cannot be predicted easily. Considerable experimentation and
>> deep knowledge of the hardware behavior is required to make this work well.
>> The only reason it is worth considering at all is that a single
>> architecture dominates the computing landscape. Intel is unlikely to do
>> that breaks a lot of major codes and core counts are likely to continue to
>> The concept is simple. In sections which are not execution order
>> dependent, a small number of coroutines are started by a signal. These
>> coroutines then run to completion at which time they send a signal to the
>> routine which decrements a counter. When all the coroutines have completed,
>> the counter decrements to zero and the main routine continues. If one can
>> avoid thrashing due to cache collisions, those portions of the code can be
>> speeded up by the number of cores employed. Of course, total speedup will
>> be much less as dictated by Amdahl's law.
>> In looking at some of the source code for simplex, I see places where
>> there is a loop over vector operations. The natural thing to do for these is
>> use the SSE instructions to speed up the vector operations and spread the
>> vectors over multiple cores.
>> Though conceptually simple, none of this is easy to do. In addition,
>> GLPK is easily the most complex code I've ever looked at in 20+ years of
>> working on several million lines of scientific codes. Fortunately, it's also
>> hands down the best written large code I've ever seen which is a real
>> Any optimization work on a code as sophisticated as GLPK is a major
>> undertaking which will take a long time to execute. The first step is
>> and coverage analysis.
>> If there is sufficient interest in this subject, I propose to implement
>> and document performance profiling and coverage analysis of GLPK in the
>> wikibook using the various Netlib problem sets. This will be for convenience
>> restricted to the *nix environment. However, it should generally be
>> applicable to Windows if a *nix environment package such as Cygwin is used.
>> I am particularly interested in comments from Andrew, Marc, Robbie and
>> xypron. All have been very generous to me in different ways and this is an
>> attempt to repay them. I come from a seismic processing background where run
>> times are measured in tens of thousands of hours for a single dataset.
>> Fortunately, most seismic codes are trivially parallel. So a few hundred quad
>> core nodes and a couple of weeks will get the job done. Were that not the
>> case, oil would be a lot more expensive than it is. But we still do a lot
>> of work to make the codes run faster.
>> Have Fun!
> Help-glpk mailing list
|[Prev in Thread]
||[Next in Thread]|
- [Help-glpk] [Fwd: Re: Optimization and Multicore GLPK],
Andrew Makhorin <=