speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Post-synth speedup of speech


From: Samuel Thibault
Subject: Re: Post-synth speedup of speech
Date: Sun, 8 Sep 2024 16:43:44 +0200

Hello,

Bill Cox wrote:
> TL;DR: Can I send speechd maintainers a patch to integrate my libsonic
> speech-speedup algorithm as an optional post-process to synthesized samples? 

Sure!

The question is more how to expose this to users. The rate range is
-100 to 100, should we keep the 0 value as it is and make the 0 to 100
interval ramp up more? Or should we also speed-up the 0 value ?

The concern is about stability of configuration for users. Ideally
rate 0 would be similar for all syntheses. But "similar" remains to be
defined. Using the same rate in word per minute probably doesn't make
sense since depending on the voice it can produce a very different
experience. AIUI what was done over the years was to keep 0 as meaning
"the default speed of the synth", which is supposed to be reasonable for
that synth.

Bill wrote:
> Voxin voices I tested do sound like they already include the basic
> algorithm in libsonic into their vocoder, enabling reasonable
> high-speed speech.  When I set the rate to 100, it is only about 3.5
> times faster than the default speed.  This is too slow for me.  I
> suspect that the TTS engine for these voices can take high speeds than
> this,

Reading the ibmtts module:

        /* Possible ECI range is 0 to 250. */
        /* Map rate -100 to 100 onto speed 0 to 140. */

I don't know why Gilles Casse has set the maximum to 140 instead
of the possible 250 maximum. An option could be added to make that
configurable, at least. I wonder if we should just make 100 be mapped to
250, so that users get to set it right from Orca rather than having to
configure speechd.

Bill wrote:
> far more stable system where TTS synthesizers
> return audio samples, rather than play them directly!
[...]
> Thank goodness we only have to change the audio interaction in one
> place.

It was already just one place, loaded by the modules.

Bill wrote:
> A further benefit of the new architecture is it now gives us a single location
> to do speech post-processing if needed.

It would have been so too. But actually I don't think we want to put
libsonic processing in the server audio process, because that raises a
lot of questions: we probably do not want to enable it on synths which
are already able to produce fast rates. Again, the question is how to
expose things to the user, we want to have something that is easy for
them to configure and be coherent. That's why I'm thinking modules
themselves should probably be made to call libsonic to post-process
their output before sending to the server, with parameters depending on
their own capabilities.

Bill wrote:
> However, Voxin appears to no longer provide the venerable Eloquence
> voice, as it is known in JAWs.  The new voices provided by Voxin are
> nice, and I have them.

Is it not possible to use the ibmtts module to keep the old ibmtts voice
working?

Samuel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]