gnuspeech-contact
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuspeech-contact] Re: Festival + gnuspeech


From: David Hill
Subject: [gnuspeech-contact] Re: Festival + gnuspeech
Date: Fri, 21 Dec 2007 21:10:54 -0800

Hi Marcelo,

On Dec 11, 2007, at 5:39 AM, Marcelo Yassunori Matuda wrote:

Hi David,

Thanks for your response!

one point.  I have finally decided against LGPL for various reasons, so it
really can only be GPL v.3 now and I have been dilatory.

Ok.

Steve tends to lurk on the mailing list and is not actively involved in
gnuspeech for now, though he did 99% of the Mac/GNU port.  I am sure he'd be
happy to give firm answers to your two questions & I'll copy this to him.

I did some modifications in the program and am now using the mxml.

OK -- good!


What you have done sounds interesting, but I am not sure why you have gone
that route.  How have you integrated the dynamic parameter creation stuff
from Monet?  Did you reimplement the relevant parts, or are you using the
basic scheme from Monet?  What does it entail -- "using the Mbrola voice
adapter".  I was never that impressed by Mbrola.  Do you have any speech
samples from your system yet?  I shall be really interested to find out more
about what you have achieved.

I am sending a sample attached.
Festival can call the mbrola executable, for this it creates a
temporary text file with phoneme data. I take this file and feed to
the synthesizer (see attached mbrola_dump.txt). The format is:

phoneme duration [ freq_pos1 freq1 [freq_pos2 freq2 ...]]

A .snd, .wav, .au or whatever file of the speech it produces would be nice so I could easily listen your final output.


I am not using mbrola. In the future, if I learn the Scheme language
of Festival, I may remove this "indirection".

The program uses only the information contained in diphones.mxml
(rules, postures, equations, transitions, etc.) and the posture
rewriter from Gnuspeech.

What's the "dynamic parameter creation stuff"?

I am referring to the rules and data for composing the transitions.  I attach a diagram of four of the parameters as generated by Monet/gnuspeech (the glottal volume, the fricative volume, the r3 radius and the r6 radius, from the utterance "Is that cheese to eat?  The vertical gray lines represent the timing framework for realising the posture-to-posture dynamics, based on our rhythm research for the timing, and on real speech data for the form of the transitions and the parameter data for the radii etc..  A big chunk of the non-real-time interactive part of Monet is concerned with helping create the data needed to specify these dynamic aspects, which capture both rhythm and co-articulation.  The real-time speech synthesis component embodied in Monet uses this data to compose the parameter variations based on the targets (postures), the timing, and the transitional specifications.  It is this latter part to which I was referring as "the dynamic parameter creation stuff"

The GUI? The program is
command line only. I am parsing and evaluating the equations and
boolean expressions. I am not using the intonation from Gnuspeech, I
take the pitch information from Festival. The durations are defined by
diphones.mxml.

The precise point at which pitch changes occur are very significant for naturalness and even for hearing the intended intonation pattern in relation to the utterance composed at all.  If you "take the pitch information from Festival" it is not clear how you can get accurately registered intonation patterns. One of the key advantages of gnuspeech -- apart from the articulatory basis -- is the better than usual intonation and rhythm.  Degradation of these aspects significantly affects the perception of the speech and makes it much harder to listen to for long periods.


What's the "basic scheme from Monet"?

See above.


This is my first non-trivial C++ program, I am using it to learn the language.
In the begining I was planning to use only the tube model, but I
perceived that I would have to reimplement what was already done by
Gnuspeech, so I decided to use the diphones.mxml data. I tried to not
read too much of the ObjC code, to not confuse my brain :) And I
couldn't compile Gnuspeech in my Linux machine...

Gnuspeech/Monet is an impressive system. I think it may achieve a much
better quality.

Thank you.  Are you aware of the documentation for the gnuspeech system:

HILL, D.R., MANZARA, L. & TAUBE-SCHOCK, C-R. (1995) Real-time articulatory speech-synthesis-by-rules. Proc. AVIOS '95 14th Annual International Voice Technologies Conf, San Jose, 12-14 September 1995, 27-44 (C)

HILL DR (2006) Manual for the Synthesizer application -- part of the GnuSpeech text-to-speech toolkit On-line manual relevant to the real-time articulatory-synthesis-based text-to-speech system described in  the AVIOS 95 paper: "Real-time articulatory speech-synthesis-by-rules"
HILL DR (1993,2004) MONET speech synthesis editing system manual (TextToSpeech Kit tool). On-line manual relevant to the real-time articulatory-synthesis-based text-to-speech system described in the AVIOS 95 paper: "Real-time articulatory speech-synthesis-by-rules"
HILL DR, MANZARA L & C-R SCHOCK (1993, 2003) Pronunciation guide for TextToSpeech kit Pronunciation guide for Webster's, Trillium and IPA phonetic transcriptions. 

 HILL DR (2001) A conceptionary for speech and hearing in the context of machines and experimentation This document is a considerably enlarged and revised version of Hill (1976b) below. It is designed as an educational and reference tool (Check the term "conceptionary" in the Wikipedia).


Sorry for the delay in replying.

All god wishes.

david
-----------
David Hill
--------
 The only function of economic forecasting is to make astrology look respectable. (J.K. Galbraith)
--------

Illustration of some gnuspeech parameter tracks as displayed by Monet on a Mac under OS/X



"Is that cheese to eat?"

----------------------------------------------------


reply via email to

[Prev in Thread] Current Thread [Next in Thread]