TTS algorithms (Re: Comments on the Text to Speech "algorithm")

Bill Cox
TTS algorithms (Re: Comments on the Text to Speech "algorithm")
Sun, 28 Feb 2010 10:05:16 -0500

IMO, the espeak voice for English isn't bad at default speech rates.
I only have three issues with espeak.  First, if you try to play over
about 300 words per minute, it gets garbled.  Certain sounds don't
seem to shrink at the same rate as others, and soon your speech is
dominated by stuff which is simply played too long.  If this could be
fixed, espeak would be a real competitor to ibmtts.  Second, it has a
Brittish accent, even if you select American English.  Finally, I
couldn't find enough documentation to begin hacking the code and
database easily.

What would be really cool is if we could sample a person's voice, and
automatically extract formant information, and if we could
automatically build dictionaries for specific accents, including
prosody.  I find that the blind are quite happy volunteer time when we
give them accessible tools to do cool stuff.  I suspect we could find
volunteers willing to build awesome voice databases.

BTW, I checked the "Mary" voice mentioned above.  It simply glues the
FreeTTS front end to the MBROLA back end, so I'm not sure it's any
improvement over espeak feeding MBROLA.  MBROLA sounds great at it's
default of around 150 words per minute, but it falls appart quickly as
you increase speech rate.


On Sun, Feb 28, 2010 at 8:14 AM, marc <sintsixtus at gmail.com> wrote:
> Klaus Knopper wrote:
>> The example you gave is kind of bad because you would never glue
>> together "letters", since they simply don't match the "sound" you
>> associate to them when reading, it depends on their surrounding context
>> how they are pronounced.
> I only gave the example to store _words_ in some tree.
> Marc
