[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [fluid-dev] Multi core support not so great
From: |
josh |
Subject: |
Re: [fluid-dev] Multi core support not so great |
Date: |
Mon, 28 Sep 2009 13:46:02 -0700 |
User-agent: |
Internet Messaging Program (IMP) H3 (4.1.6) |
Quoting David Henningsson <address@hidden>:
address@hidden skrev:
I finished implementing a first pass at multi-core support.
Oh, now I really must order a multi-core computer ;-)
Yeah, I want one now too! :)
While it
was a fun task, it didn't really yield the kind of performance I was
hoping for. For those interested here is a description of the current
logic:
Added a synth.cpu-cores setting.
Perhaps "synth.workerthreads" would be more clear.
I still think cpu-cores sounds fine.. Its the naming convention that
Renoise uses for example.
Additional core threads are created in new_fluid_synth()
(synth.cpu-cores - 1).
Primary synthesis thread signals secondary core threads when there is work.
Primary and secondary synthesis threads process individual voices in
parallel.
Primary thread mixes all voices to left/right, reverb and chorus buffers.
Having multiple cores really just gives you the ability to have more
voices in the case of live performance (before maxing the CPU) or
*should* make your -F (fast MIDI render to file) operations go faster.
The reason I say *should* is because it really depends on how complex
the MIDI file is. If there aren't a lot of voices, it may in fact be
slightly worse performance. Best case I have seen so far was about a
20% increase in speed (for the -F render case), which is something.
Interestingly the 2 cores were still not quite maxed.
Two things come in mind:
1) does the audio buffer size matter for the performance in this case?
Do you mean the FLUID_BUFSIZE (64) compile time size? That matters
for both multi-core and single core. I don't think it necessarily is
more of a concern in the multi-core case.
2) if you run single-threaded, is one core maxed?
Well, close to it. I should double check, good point.
One issue that I have stumbled upon, is in regards to thread
priorities. We want the secondary core threads to be running at the
same priority as the primary synthesis thread, for round robin sort of
response (though it may not matter that much if they are on separate
CPUs). In the case of -F fast rendering you definitely don't want your
processes running high priority (especially on Linux). In the live case
though, the audio driver will be running high priority, so you want the
secondary core threads also running high priority. The issue is, that
currently the secondary core threads are created in new_fluid_synth(),
while the synthesis "thread" is created by audio drivers or via other
means. There needs to be some way to ensure that the secondary threads
end up having the same priority. Any ideas? Perhaps a one time
creation of the secondary threads within the fluid_synth_one_block
routine and an attempt to make them identical in priority, would make
sense.
I can imagine starting threads must have a higher upper bound in time
than e g malloc, so starting them from fluid_synth_one_block will
probably lead to an underrun (at one time only, but still). I would
prefer to have the audio drivers create the additional threads. After
all, they are the ones who know how to create threads with the right
priority, right?
You're right about the issues of starting up threads (even if one
time). Your suggestion sounds like the right thing to do. Its not
yet apparent to me though how to correlate the cpu-cores setting, the
resulting memory allocations related to that setting and the secondary
core thread creation. What happens if FluidSynth is embedded in
another application, for example. If that application is handling the
audio itself, no worker threads would be created. Perhaps thats OK
and it should be left up to the application to create the additional
core threads, which would require additional API.
In summary:
I realized through all this, that optimization is probably more
important than multi-core support. Enabling multi-core support
introduces additional overhead, so unless you are trying to get more
voices in the realtime case or render MIDI files slightly faster, you're
better off not enabling it.
So now that I learned my lesson. Should I commit the code? ;) Does it
seem worth it? At the moment there may be some very minimal additional
overhead in the single core case (compared to before), but that is
probably so minimal as to be lost in the noise.
If all overhead is passing through some "if (cpucores > 1)" lines, I
would say it's nothing to worry about at all.
I think you should commit it, but call it an experimental feature at
this point. It could be a good ground for better multi-threading in the
future.
I was previously using a mutex to lock the core parameters. This
required a pair of lock/unlock instructions per voice, which I think
was a significant amount of CPU consumption. I'm switching to a
lockless model, where the lock is only held when worker threads are
checking and then waiting for work. So only one pair of lock/unlocks
per fluid_synth_one_block() per thread. That should help minimize the
overhead and hopefully give better results.
I forgot that SCHED_FIFO processes continually run until they sleep,
yield or a higher priority process becomes runnable. This oversight
locked up my machine, since the primary thread wasn't able to process
the audio buffers of the continually running secondary thread.
Probably shouldn't be running SCHED_FIFO to begin with, when debugging
stuff ;)
Which reminds me of another feature that should be implemented. I
think we should add FluidSynth settings to enable/disable RT
capabilities and set priority levels. I'm thinking something like this:
audio.realtime = yes/no
audio.realtime-prio = 1-99
midi.realtime = yes/no
midi.realtime-prio = 1-99
What do you think? Might not apply on certain architectures.
I'm thinking about pre-rendering a few buffers for every voice,
with some rollback in case the voice should change (i e note-off events,
modulators etc). That would bring more stability, but it is a very
long-time goal and is nothing I plan to implement in the near future.
Btw, if you're up to implementing features, I would suggest giving the
libaudiofile / libsndfile a go. "How do I make a wave file" is an issue
coming up every now and then.
I was planning of working on that next. Especially since you
forwarded me that wav patch recently. Using libsndfile or
libaudiofile makes the most sense, for the added flexibility. Could
render to floating point too for that matter. I would probably just
add the feature to the -F fast render feature, though I suppose the
aufile driver might also benefit, though it seems less useful now.
// David
Josh