[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnucap-devel] [devel-gnucap] Parralelism

From: al davis
Subject: Re: [Gnucap-devel] [devel-gnucap] Parralelism
Date: Wed, 26 Feb 2014 19:34:24 -0500
User-agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; )

On Wednesday 26 February 2014, beranger six wrote:
> I start a work to parellelize(with openmp, and then maybe
> cuda) the LU decomposition in gnucap.

Openmp looks like a good way to do it.  It looks simple, and 
comes with most compilers including gcc.

Do not use cuda.  Unless I misunderstand, the licensing of cuda 
makes it unsuitable for use in a GNU project.

> I see your topic about parallelism, and i have some time to
> do it.
> Futhermore, we definitly need faster simulation result for
> our application.
> What kind of solution did you have in mind:

Very simple ..  Identify certain loops that can run in parallel.  
That is really all.

You should look at the output of the "status" command to see 
where the time is spent, which will show where parallelism could 
be of benefit and how much benefit to expect.

In the LU decomposition, running the outermost loop in parallel 
should be all that is needed there.  But to get enough benefit 
model evaluation also should be parallel, and likely more 

> -Is it the "section" you design with row, diagonal, and
> column? In this case did you want to use , the fact that if
> all the section beetween _lownode[mm] and mm are calculated
> we could computed the element.
> In this case we could have a dependence graph(or tree)
> applied to your storage matrix section, mostlyy used to
> parrallelize Gilbert-Peierls Algorithm .

I don't think that makes sense here, but you might want to try 
it.  Remember ..  gnucap's matrix solver usually does low rank 
updates and partial solutions.  If you lose this feature it 
could make it so much slower that any  parallel operation can 
not come close to recovering the loss.

The simpler solver used for AC analysis is not parallel ready.  
To parallelize the AC matrix solution it may be necessary to 
switch to the other lu_decomp, which requires double matrix 

> -Is it an iterative method,with the problem that the
> convergence could take theorically an infinite number of
> operation.(so maybe not a good way)

no -- not iterative -- except for the standard DC_tran Newton 
iteration which would not change.

> -Is it parallelize only the map(multiplicaton beetwen
> element) of dot product, and then maybe parallelize the
> reduction(addition beetween elements).

I think the overhead of parallelizing the dot product would be 
too high, thinking of the multi-thread model.  The dot product 
might be a candidate for GPU type processing, but look at 
"status" to judge whether there is enough potential benefit 
before doing this.

> -Did you had in mind to apply permutation matrix to ease
> implementation of parrallelism, or directly doing the best
> matrix in evalution of netlist.

No .. that would probably make it slower.  The speed gain of a 
better order would be offset by the overhead of ordering and the 
more complex access.

Also ..  remember that gnucap does incremental updates and 
partial solutions.  The order that is optimal for this is 
different from the ordering optimal for solving the entire 

I am aware of a problem with read-in where the recursive 
"find_looking_out" can waste a lot of time.  Again, "status" 
will tell you.

> Regards,
> Beranger Six
> _______________________________________________
> Gnucap-devel mailing list
> address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]