[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: Re: [fluid-dev] Floats and doubles, simd and interpolation

From: David Henningsson
Subject: Fwd: Re: [fluid-dev] Floats and doubles, simd and interpolation
Date: Tue, 14 Dec 2010 09:19:02 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20101208 Thunderbird/3.1.7

Did this message reach the mailinglist? I can't find it in the archives, so I'm resending it.

Since then, I tried making an hand-optimized SSE version of fluid_rvoice_buffers_mix, but it performed worse than ORC's version...

// David

-------- Original Message --------
Subject: Re: [fluid-dev] Floats and doubles, simd and interpolation
Date: Sun, 28 Nov 2010 07:11:04 +0100
From: David Henningsson <address@hidden>
To: fluid-dev <address@hidden>

On 2010-11-22 09:17, David Henningsson wrote:
So the reason I like floats is that with SSE, you can process 4 floats
simultaneously, but only 2 doubles. From running a perf I know that 2/3
of the time (for my testcase) was spent in the interpolation routine.
Can we SIMD:ize that, we might get 3-4x speed improvement, that's at
least what I hope for.

There is a library called "ORC", anybody heard of it? You write some
pseudo-assembly code, and on first run ORC translates it into SSE, MMX,
Altivec, etc, or plain old C depending on your hardware. I think it
sounds interesting, and was hoping to see if I could make a test soon,
but then I got busy trying to find that bug instead.

So a follow-up on this. I have the same testcase as stated earlier
(FluidR3 sf2 and Dont_you_worry_about_a_thin.mid).

Rendering with doubles takes ~12.3 s, rendering with floats takes ~11.9
s, that's on a 64 bit Ubuntu Maverick (one core, -z 4096). According to
perf, here's where we spend the most time:

    41.47%  fluid_rvoice_dsp_interpolate_4th_order
    21.17%  fluid_iir_filter_apply
    10.05%  fluid_rvoice_buffers_mix
     8.18%  fluid_revmodel_processmix
     5.40%  fluid_chorus_processmix
     2.75%  fluid_rvoice_write

So since fluid_rvoice_buffers_mix was the simplest one to optimize, I
tried to make an ORC version. After having downloaded the latest version
of ORC from Debian Experimental (the one coming with Ubuntu Maverick was
buggy), I ended up with ~11.1 seconds and fluid_rvoice_buffers_mix (or
rather a strange orc function) being 5% of the total instead of 10%. I
also spent some time looking at the iir_filter_apply and interpolation

So experiences from this experiment:

 - ORC is still immature, and does not seem to be able to handle more
complex things like iir_filter_apply and 4th interpolation yet.

 - I was expecting more improvement from ORC - SSE should be able to
process 4 floats at once, so the time should have decreased with a
factor of 3-4 rather than a factor of 2. (I haven't tried writing a
hand-optimized SSE function to compare with.)

 - In addition iir_filter_apply function is difficult to SIMD optimize
since every sample depends on the previous sample, via the
dsp_centernode variable.

 - The interpolate_4th_order function (which is the standard order) is
difficult to SIMD optimize due to loop conditions (where you sometimes
have to interpolate over sample points in both loop start and loop end
for the same destination sample).

 - Do we really need more performance? Today's computers can handle
thousands of voices in real-time, and if you have an old computer you
might not have SSE anyway...

 - Even though SIMD doesn't seem worth the effort at this point, I'd
still like to revisit the float vs doubles question. On my amd64, floats
seem slightly faster than doubles. So my question is: when or what do we
gain from the increased precision? So far, the only thing I've heard is
this: http://lists.nongnu.org/archive/html/fluid-dev/2010-09/msg00053.html
Victor, can you follow up, perhaps redo the listening test with latest
trunk with the float bug fixed and see if there still are quality

// David

fluid-dev mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]