gnuspeech-contact
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuspeech-contact] Re: Questions about TRM (GnuSpeech)


From: David Hill
Subject: [gnuspeech-contact] Re: Questions about TRM (GnuSpeech)
Date: Sun, 18 Jan 2009 16:35:13 -0800

Hi Sasivimon,

I apologise for the delay in responding to you email.  Been *very* busy with far too many incoming emails and other activities.

On Jan 13, 2009, at 10:32 AM, sasivimon wrote:

Hello,

I'm trying to experiment the human voice pronouncing using TRM and a spreadsheet application.
But I can't figure out how to find the relation between the variation of interpolation function of vocal tract shape
and the variation of formants(f1 and f2)(i think monet use bezier curve to calculate the interpolation of vocaltract shape) .
Something like if I change the interpolation function from bezier curve to sine curve then how the f1 and f2 value would change.
My question are:
1. How the interpolation function of vocal tract shape affect to the formant?

The TRM is simply a waveguide model of the vocal tube and contains no information about how to produce speech in any language.  In fact, the TRM could equally well simulate a trumpet.


2. How come the monet use bezier curve to manipolate between 2 vocal tract shape?

It doesn't really use "Bezier" curves.

Monet is a multipurpose editor that allows the TRM values for speech postures to be stored, the shape of parameter tracks between postures to be defined according to a collection of rules, with yet more rules selected according to the particular combinations of speech postures defined by the input.  These further rules decide which parameter track rules to use, and what timing to apply to the quasi-steady-state and transitional regions of each dynamic change from posture to posture.  When Monet produces its output speech, it further applies rhythm and intonation models to vary to prosody of the speech.


3  And how can we improve the interpolation function compare to the real human?

By using Monet to define/refine suitable trajectories between successive postures.


4. According to your website (http://pages.cpsc.ucalgary.ca/~hill/papers/synthesizer/body.html). I would like to know the principle
   of how you define the value of the parameters in the parameter table in the Appendix A?

Ideally, we would have X-Ray data that would allow us to define the area functions for the TRM that would be stored as part of the Monet database.  In practice the interactive program "Synthesizer" is used to determine which steady state values of the TRM region parameters will produce the sounds that are needed.  In the case of postures that do not produce much sound during closure (stops, fricatives), the concept of "locus" as determined by the Haskins Laboratories in the 50s and 60s is used.  The locus is the "origin" of the spectral transitions, and represents the posture that would produce the "virtual sound" that represents the "locus" of the stop sound.

I read the pronunciation guide (Manzara & Hill 2002)
   but I have no clue on how to interpret those notation in to a numerical parameter for TRM especially for non-English language.

You need to get Monet up and running on a Macintosh or under GNUstep on a Linux machine -- sources are in the savannah repository, accessible using SVN (ignore the CVS repository):


and you will have a much better idea of what is involved.  There is also a web page accessible from that page, under the heading "Project Home Page" that provides a short descriptive overview of the whole Gnuspeech system.  It can be accessed directly at:


Hope this helps.


  
Sorry for my bad English.


Your English is fine, thank you.

All good wishes.

david
--------
David Hill
--------
Simplicity, patience, compassion. These three are your greatest treasures  (Tao Te Ching #67)
---------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]