Re: [Discuss-gnuradio] benchmark_* not working correctly

From: Eric Blossom
Subject: Re: [Discuss-gnuradio] benchmark_* not working correctly
Date: Mon, 1 Oct 2007 21:48:44 -0700


On Mon, Oct 01, 2007 at 06:07:51PM -0700, Tim Meehan wrote:
> Eric,
> The QA code (qa_gr_fir_ccf.cc) forces a 16 byte alignment.  When the
> malloc16Allign is replaced with a regular malloc in the QA code, make check
> fails.
> I believe that there is an additional requirement that the data passed to
> the low-level SSE code have the real sample start on the 0th or 2nd 4 byte
> float.  For example the R / C represents 4 byte floats (Real, Complex) , 0
> represents "forced alignment" from gr_fir_ccf_simd.cc
> RCRC...  OK
> 00RC...  OK
> 0RCR...  Not OK

Hmmm.  Does it ever use the 0RCR case?  I would expect only the first
two.  It may be reusing the fff simd code which generates all 4
alignments for the taps, but I wouldn't expect to see the 0RCR or 000R
input cases.

> Q: Is my assumption of the additional requirement correct?
> Q: I don't think it will be easy to force the additional requirement with
> the same trick used in gr_fir_ccf_simd.cc; do you agree?

I don't see that this as an additional constraint.
gr_complex == std::complex<float> is always laid out (<real>,<imag>).
sizeof(gr_complex) == 8, so with 16-byte alignment, we still always
have good alignment.  Are you seeing a case where the input has the
real on a mod 8 == 4 boundary instead of a mod 8 == 0 boundary?

If so, (1) where's the input data coming from, (2) what version of the
compiler are you using?

However, back to your first point, if we are using the 0RCR case, then
the code is completely wrong, and I don't see how it could ever pass
the QA tests (which it seem to).  On the other hand, there could be
some problem with how the float taps are mapped across the complex
input  (It's been along time since I looked at the code...)

Thanks for looking at this!


> Tim
> >
> >
> > Yes, it does get called at "make check" time.
> >
> > FWIW, it's run by way of gnuradio-core/src/tests/test_all
> >
> > It's possible that there's an alignment requirement that's not being
> > honored at runtime.  The low-level SSE code (fcomplex_dotprod_sse64.S)
> > requires that its input and taps be 16-byte aligned.  gr_fir_ccf_simd
> > allocates 16-byte aligned buffers for the relevant buffers, so it
> > should be working OK.   Perhaps one of you seeing the problem could
> > add an assert or two to confirm that the alignment is correct.
> >
> > Eric

