gnuspeech-contact
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuspeech-contact] Re: Adjustment to the Carr é DRM model


From: David Hill
Subject: [gnuspeech-contact] Re: Adjustment to the Carr é DRM model
Date: Wed, 12 Nov 2008 18:35:24 -0800

Hi Dalmazio,

On Nov 10, 2008, at 9:13 PM, Dalmazio Brisinda wrote:

Very cool! Much of this sounds quite similar to what we were talking about over a month ago re: computation of the resonant function based on a smoothly changing radial interpolation function depending on where in the tube we were -- but especially at boundaries. In this case, they use MRI data for this function.


As I mentioned before, the snag is that more sections would be needed, the sample rate would increase, and computation speed would likely be an issue again.


Just had a slightly playful thought, I wonder if there is MRI data for samples limited to aesthetically pleasing male and female voices (separate). I'm sure there would be some physiological differences between taking the average of MRI data over a large 'random' sample vs. limiting to just 'attractive' samples.


Voice quality has more to do with the glottal excitation function (including intonation) than vocal tract shape, though some vocal tract effects are pleasing -- like *clarity* of articulation, on which we still don't have a good handle (some speakers seem to adjust their articulation to maximise the clarity by adjusting the formants for best effect, but not in a voluntary way.  I got that from Walter Lawrence himself).


So, I'm curious, what were the subjective results like? I would suspect much smoother sounding synthesis, and therefore greater intelligibility.


Good topic for a PhD thesis :-)  Intelligibility is not synonymous with better quality synthesis.  DECTalk (MITalk) is pretty intelligible but very tiring to listen to for long periods, which is probably due in large part to the unnatural rhythm and intonation.


That shifting of the zero-crossings/DRM boundaries towards the lips is also interesting. 60/40 weighting for the length of the back half vs. the length of the front half. Is anyone looking into incorporating these two changes into gnuspeech? The 60/40 weighting change would probably not be too difficult. The change involving using MRI data to create a non-uniform radial function sounds a little more involved though, but very interesting!


The real point is that the "rest" state of the tube is non-uniform, but it produces similar formants to a uniform tube. This means the boundaries of the tube DRM regions are shifted from the original theory and the radii have to be different in the rest state.  This almost certainly means, again, more sections are needed.  It would not be that easy but needs to be looked at.

Warm regards.

david
--------
David Hill
--------
 The only function of economic forecasting is to make astrology look respectable. (J.K. Galbraith)
--------

On 10-Nov-08, at 9:11 PM, David Hill wrote:


Hi Dalmazio,

I thought you might be interested in the paper of which the attachment is a summary.  Basically, the author is saying that the real neutral vocal tract is not a uniform tube, even though it produces formants very similar to a uniform tube but the non-uniformity moves the DRM boundaries towards the lips.

Warm regards.

david

-------

The Journal of the Acoustical Society of America -- November 2001 -- Volume 110, Issue 5, pp. 2761-2762

A distinctive region model based on empirical vocal tract area functions (A)

   Brad H. Story
   Univ. of Arizona, Speech and Hearing Sci., P.O. Box 210071, Tucson, AZ 85721-0071

The development of the Distinctive Region Model (DRM) [Mrayati et al., Speech Commun. (1988)] is based on theoretically derived acoustic characteristics for a tube of uniform cross-sectional area assumed to approximate a neutral vocal tract configuration. Formant sensitivity functions calculated for the uniform tube are used to divide the vocal tract into distinctive regions that, when constricted or expanded, will cause the formant frequencies to change in a predictable pattern. This study compares the original DRM (based on a uniform tube) with a new version created from a neutral vocal tract area function derived from published MRI data. Because it is subject to physiologic constraints, this neutral area function is nonuniform in cross-sectional area variation but exhibits formant frequencies similar to a uniform tube. Sensitivity function calculations for F1, F2, and F3 also show similarities to those of a uniform tube, but the zero-crossing points that divide the vocal tract into distinctive regions are shifted toward the lips. The result is distinctive regions that are not symmetric about the vocal tract mid-point but rather the back and front regions occupy about 60% and 40% of the total tract length, respectively. [Work supported by NIH R01-DC04789.]

--------




reply via email to

[Prev in Thread] Current Thread [Next in Thread]