[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] oprofile inband code results

From: Eric Blossom
Subject: Re: [Discuss-gnuradio] oprofile inband code results
Date: Tue, 9 Oct 2007 16:43:43 -0700
User-agent: Mutt/1.5.9i

On Tue, Oct 09, 2007 at 05:13:08PM -0400, Brian Padalino wrote:
> I really don't know much about oprofile and haven't done much
> profiling, but I do have a question or two.
> Q. Since the profiler looks at the lowest function that is taking so
> much time, I find it strange that pmt_nthcdr is the second method
> listed there.  Intuitively, pmt_nthcdr should just run a tight loop of
> pmt_cdr in which case I would assume pmt_cdr would be higher on the
> list but it is not.  Same with pmt_nth.  What might be taking so long
> within those functions that is NOT taking as long within
> pmt_cdr/pmt_car?  Is something turning into an inline function which
> really yields a false profile?
> Q. I am surprised to see a destructor (pmt_pair::~pmt_pair())
> utilizing so much time.  Are there that many pmt_pairs that have to
> get destroyed?  To answer my own question, I suppose so since every
> call to pmt_cons actually creates a new pmt_pair - which might be a
> good reason why the malloc and frees are high on the list.  Any idea
> why so many pmt_cons are used?

Because we use a lot of them to construct argument lists.  
It would be possible to move to an pmt_vector based approach, which
would cut this down dramatically.  I think we're still a bit early in
the game to start that kind of modification.

I think the first thing I would try is moving to the intrusive
implementation of the boost shared pointers for the pmt types.
Then I'd look at a data type specific alloc/free as well as see how
the default allocator is working across multiple threads.  That is,
does it already use a separate allocation pool / thread.  If it
doesn't, we could speed up the allocation/free and reduce the amount
of locking required in the typical case.

Before hacking away, I think we need to run the same test cases on
other machines besides the P4 Xeon and gather the oprofile data, as
well as the basic [ $ time <mytest> ] numbers.  We may find wildy
different answers as f(microarchitecture). There's a reason intel
isn't featuring the P4 Xeon anymore ;)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]