|
From: | David Hill |
Subject: | [gnuspeech-contact] Re: Festival + gnuspeech |
Date: | Fri, 21 Dec 2007 21:10:54 -0800 |
Hi Marcelo, On Dec 11, 2007, at 5:39 AM, Marcelo Yassunori Matuda wrote:
OK -- good!
A .snd, .wav, .au or whatever file of the speech it produces would be nice so I could easily listen your final output.
I am referring to the rules and data for composing the transitions. I attach a diagram of four of the parameters as generated by Monet/gnuspeech (the glottal volume, the fricative volume, the r3 radius and the r6 radius, from the utterance "Is that cheese to eat? The vertical gray lines represent the timing framework for realising the posture-to-posture dynamics, based on our rhythm research for the timing, and on real speech data for the form of the transitions and the parameter data for the radii etc.. A big chunk of the non-real-time interactive part of Monet is concerned with helping create the data needed to specify these dynamic aspects, which capture both rhythm and co-articulation. The real-time speech synthesis component embodied in Monet uses this data to compose the parameter variations based on the targets (postures), the timing, and the transitional specifications. It is this latter part to which I was referring as "the dynamic parameter creation stuff"
The precise point at which pitch changes occur are very significant for naturalness and even for hearing the intended intonation pattern in relation to the utterance composed at all. If you "take the pitch information from Festival" it is not clear how you can get accurately registered intonation patterns. One of the key advantages of gnuspeech -- apart from the articulatory basis -- is the better than usual intonation and rhythm. Degradation of these aspects significantly affects the perception of the speech and makes it much harder to listen to for long periods.
Thank you. Are you aware of the documentation for the gnuspeech system: HILL, D.R., MANZARA, L. & TAUBE-SCHOCK, C-R. (1995) Real-time articulatory speech-synthesis-by-rules. Proc. AVIOS '95 14th Annual International Voice Technologies Conf, San Jose, 12-14 September 1995, 27-44 (C) HILL DR (2006) Manual for the Synthesizer application -- part of the GnuSpeech text-to-speech toolkit On-line manual relevant to the real-time articulatory-synthesis-based text-to-speech system described in the AVIOS 95 paper: "Real-time articulatory speech-synthesis-by-rules" HILL DR (1993,2004) MONET speech synthesis editing system manual (TextToSpeech Kit tool). On-line manual relevant to the real-time articulatory-synthesis-based text-to-speech system described in the AVIOS 95 paper: "Real-time articulatory speech-synthesis-by-rules" HILL DR, MANZARA L & C-R SCHOCK (1993, 2003) Pronunciation guide for TextToSpeech kit Pronunciation guide for Webster's, Trillium and IPA phonetic transcriptions. HILL DR (2001) A conceptionary for speech and hearing in the context of machines and experimentation This document is a considerably enlarged and revised version of Hill (1976b) below. It is designed as an educational and reference tool (Check the term "conceptionary" in the Wikipedia). Sorry for the delay in replying. All god wishes. david ----------- David Hill -------- The only function of economic forecasting is to make astrology look respectable. (J.K. Galbraith) --------Illustration of some gnuspeech parameter tracks as displayed by Monet on a Mac under OS/X "Is that cheese to eat?" ---------------------------------------------------- |
[Prev in Thread] | Current Thread | [Next in Thread] |