speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Patch (rather for discussion): sonic and samplerate implementation


From: Bohdan R . Rau
Subject: Patch (rather for discussion): sonic and samplerate implementation
Date: Tue, 07 Dec 2010 08:42:34 +0100

On Mon, 06 Dec 2010 22:56:43 +0100, Tomas Cerha <cerha at brailcom.org>
wrote:
[...]
> Yes, we (I think it was me) decided to align rate setting with volume
and
> pitch into the 
> -100..100 range, but it was not a good decision.  Most synthesizers
> understand absolute 
> rates in words per minute so it would make much more sense to use WPM
> values directly. 
> Now we need to keep the -100..100 range for compatibility, but we may
try
> to define an 
> official conversion rules and make sure that all output modules use the
> same conversion. 

I don't think it's a very big problem.

Let's assume each module understand WPM (even if the synthesizer does not,
there is possible to compute its internal magic parameter from WPM in
particular module). Then - when we assign -100, 0 and +100 values to WPM -
we can simply compute WPM from -100..100 range. Scale should be not linear!

Of course we need to make changes in each module, but it should be very
big problem now.

For pitch - there is other problem.
I don't think the absolute value of base frequency is really needed (as we
don't invent "chorus of robots"). We can use base frequency of particular
voice instead and treat pitch value as relative to this base. For me - 1
step in -100..100 pitch range should be equal 12 cents (so we can shift the
voice by octave).

Also volume should be scaled in dB, but may be it's my "industrial
disease" (I worked over 15 years as sound engineer) :)

And last but not least - I think we can change speech-dispatcher API now.
For SSIP protocol there should be only minor upgrades, something like:

SET (...) ABSRATE <rate> - shoult set rate directly in WPM

SET (...) RELPITCH <shift> - should shift base pitch by <shift> cents (I
don't think it will be very usable, but allows to shift pitch by more than
octave).

SET (...) VOLUME <volume> [<unit>] - should set volume:

a) if unit is not set, it must treat <volume> as parameter from -100..100
range (now is not important, is it logarythmic or linear scale)
b) if unit is set as % (percent sign), it means linear scale (probably
values greater than 100% should be also permitted)
c) if unit is set as dB, it simple means decibells (0 is maximum, we can
assume 6 dB changes level twice, positive values should be also permitted)

Changes in C/Python API are trivial.

Is this acceptable?

ethanak
-- 
http://milena.polip.com/ - Pa pa, Ivonko!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]