gnuspeech-contact
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuspeech-contact] Why Monet? and the Tube Resonance Model?


From: fmiller
Subject: [gnuspeech-contact] Why Monet? and the Tube Resonance Model?
Date: Fri, 22 Jun 2012 16:08:07 -0600

Why Monet?  Why not plug in festival or mbrola or some other speech
synthesis engine in order to get Gnuspeech up and running.
The quick answer is that Monet represents a better approach.  This is the
answer I've seen in the mailing-list archives. A good theoretical explanation can be found in David Hill's paper "Real-time articulatory speech-synthesis-by-rules". It's a good read, but could remain an abstraction for many readers.
Quickly and impressionistically, I aim to criticize approaches to speech
synthesis that rely on cutting and pasting from samples of real speech.
Suppose I'm interested in studying how people distinguish /p/ from /b/.
One way to study this (something I actually tried decades ago) goes
something like this:
make a recording from a real speaker
"a pea <pause> a bee <pause> a bee <pause> a pee" pick a region to work with that includes [p] and [b] and throw away the excess then cut from the middle such that you include the beginning of the vowel after the first consonant and connect that to the beginning of the vowel after the next consonant...
for a quick impression of the kinds of segmental artifacts that result from
such an approach:
open up Monet and give it this text to parse "upped shiff he e"
synthesize
a tree?
a boy?
ship flee?
it's nonsense, of course; but I WANT to give it some kind of interpretation
and I can come up with lots of possibilities. I can even have fun with it.
If I were an end user, however, with no interest in HOW speech is
synthesized then such artifacts would just annoy me.
I suppose one could get a cleaner result by cutting real speech into smaller
and smaller pieces and the result might eventually seem natural to the
casual listener...the casual listener might not notice the glitches; but the
glitches would still be there: like a 60Hz Refresh Rate that gives many
people a headache even when they're not consciously aware of the flicker. Monet: it's a better approach. fred



reply via email to

[Prev in Thread] Current Thread [Next in Thread]