Re: [Swarm-Modelling] ABMs on Graphical Processor Units

swarm-modeling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Swarm-Modelling] ABMs on Graphical Processor Units

From:	Marcus G. Daniels
Subject:	Re: [Swarm-Modelling] ABMs on Graphical Processor Units
Date:	Fri, 28 Dec 2007 17:47:08 -0700
User-agent:	Thunderbird 2.0.0.9 (X11/20071115)

Russell Standish wrote:

All you need to do is link statically, rather than dynamically. This
happens by default when you use MPICH, for instance. Then you are just
loading up the parts that you use. But seriously, how much local
memory do you get on a cell local store? If it is not enough to store
a few megabytes of dynamic libraries, it will not be enough to do any
serious ABM simulation, which tends to need 100s of MB.

There's a lot of machinery in OpenMPI that gets pulled no matter what,owing in part to multiple abstraction layers, including a componentmodel. Perhaps MPICH would be easier to strip down, but even withstatic linkage it was clear to me it wasn't going to be a < 64k (and sayanother 64k for heap) which is basically what you'd want in order tokeep it resident on the local store. (Keep in mind you want some localstore to do real work and there is only 256kb per SPU.) It ispossible, using the latest GCC, to build a library for the Cell SPU intooverlays, and have callers automatically tickle the overlay they need.When a different overlay is needed, that means pulling it over the DMA.(Not much different cost than evicting something from L2 cache, butnonetheless a cost only experienced programmers even recognize.)

The problem of keeping any serial processor busy, is one of keepingcalculations close to their memory (or other blocking operations likeI/O). The reality is if we don't do that, or fail to tolerate latencywith built-in parallelism, then we're wasting compute cycles anyway.DDR will never be as fast as a register. And we can't just wave ourhands and make all problems inherently parallel. I suppose one couldwish that SPU's would each have 24MB local stores like a high-endItanium. By my calculations that would be about 12 billion transistors.

As a datapoint, Sony's distributed [protein] address@hidden PS/3 networkhit a petaflop a few months ago. They started from the standard Gromacscodebase and started reworking and optimizing.. It soon overshadowedthe PC address@hidden network..http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats

Anyway, my point is not to push the Cell, but to say that GPUs and Cellprocessors, vector units, microprocessors all have tradeoffs. None ofthem give parallelism where it can't be proven from the code or existsas an obvious part of the algorithm.

Marcus

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Swarm-Modelling] ABMs on Graphical Processor Units, (continued)

Prev by Date: Re: [Swarm-Modelling] ABMs on Graphical Processor Units
Next by Date: Re: [Swarm-Modelling] ABMs on Graphical Processor Units
Previous by thread: Re: [Swarm-Modelling] ABMs on Graphical Processor Units
Next by thread: Re: [Swarm-Modelling] ABMs on Graphical Processor Units
Index(es):
- Date
- Thread