[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnuspeech-contact] Status of GnuSpeech

From: David Hill
Subject: Re: [gnuspeech-contact] Status of GnuSpeech
Date: Wed, 27 Oct 2010 17:52:21 -0700

Hi Ken,

On Oct 24, 2010, at 5:52 PM, Kenneth Reid Beesley wrote:

On 22Oct2010, at 15:46, David Hill wrote:

Dear Ken,

Apologies for the delay in replying to you email request.

Hello David,

Many thanks for your message.  No apologies are necessary.  As far as I know, you don't owe anything to anyone---and certainly not to me.
About once a year I look around for a practical articulatory text-to-speech toolkit, complete, supported, documented and ready to use.   I know that's asking a lot.
From what I've been able to read, gunspeech looks very promising, so I launch an inquiry from time to time.  I hope they come across as friendly inquiries.

An enquiry about gnuspeech cannot be anything other than friendly. Your interest is appreciated!

The project at present is not being actively developed. The last repository update was made approximately 11 months ago by Dalmazio Brisinda and included major upgrades to Monet and a newly completed component required (or at least very, very useful) for new language development -- "Synthesizer" which allows researchers to determine the acoustic consequences of arbitrary vocal tract configurations in the tube model (though the current "Synthesizer" is really a beta release and needs some code clean-up and a few additional features). There are a number of additional important "Monet" modules to be ported to manage posture, rule and transition editing. Unfortunately for you, the sub-modules that deal with developing these necessary new language data components within Monet are only stubs at present. Some help in continuing the port would be most welcome,though I realise that you represent an end user, and not a developer, so this is not really a call on you. [Anyone out there interested?]

Thanks for the update.  What kind of expertise do you need to continue development?

I think Dalmazio gave you a good short overview of that. If you are a Mac guy and have done any work with Cocoa (Interface Builder, and xcode) and know C you should be in pretty good shape, apart from getting up to speed on the model basis for the parameter generation (it's all there in the NeXT code in the repository of course, but reading code is no a lot of fun, especially you don't already know what it is doing).

However, there are some papers available for download from my university web site (Section F. of "Published papers" at I've attached is an HTML copy of the relevant section (F) from the "Published papers" page at the site and you should find the links work to grab a copy of any paper you'd like to see, for those papers that are downloadable.

To understand the basis for the synthesis, it is probably worth reading my 1978 paper first. It formed the basis for implementing the major component in Monet and the TextToSpeech Server, providing a desynchronized framework for manipulating the needed parameter tracks based on databases representing each language. It refers to hardware synthesizers, but these days synthesizers are run as software, thanks to phenomenal processor speed. We are using a synthesizer known as "the tube model" which is simply a full emulation of the branched acoustic tube that forms the vocal apparatus. The AVIOS paper (1995) explains how the tube model works. There are several papers related to the intonation and rhythm research for English on which the rhythm and intonation modelling of the synthesizer are based. The basic research is completely described in the 1977 ASA Meeting paper, but that is not yet on line. My 1979 and 1992 papers on the topic give you an idea of what we found, but I suspect the intonation and rhythm for a native American Indian language would be quite a bit different, though you may already have collected some data that could be used for the modelling. We actually carried out some experiments to test the "goodness" of our intonation and rhythm modelling, as derived from the data we collected from earlier analyses.

The detailed empirical process of using Monet, Synthesizer, and a good quality spectrograph, to create the databases was surprisingly easy. I really need to write a paper or manual to explain it in detail, but if you understand the target transition model of speech (a good bit different to the traditional segmental approach, especially given the desynchronisation involved in the synthesis model), the process is pretty obvious and straightforward, though you need to have access to a corpus of data on the sounds of the language you wish to create. English has been so well studied for years that the real targets for vowels and some consonants, the virtual targets for stops sounds, the noise characteristics of sibilants, and so on, were readily available. This is a separate issue from the rhythm and intonation. We created our own model for British English rhythm based on the analysis of a significant body of both formal and conversational British English speech. The intonation model is essentially that of M.A.K. Halliday and David Abercrombie -- well described in Halliday's book:"A Course in Spoken English Intonation" (OUP 1970, SBN 19 453066 3) which came with audio tapes, but is no longer in print. I should get onto OUP and see if they'll let me put the audio on my web site. We used some of the speech from those tapes as the basis for our rhythm study. But we did some experiments to test the model and added some refinements of our own.

These are the kinds of resources and approaches you'll need to create a synthesis system for a native American Indian language.

Hope this helps.

Warm regards.



The existing code-base and software components are quite stable for both Mac OS X and GNUstep. Have you tried any of them out?

Many thanks for your interest. I hope to have better news for you "real soon now".

I'm mostly a Mac guy, so I really should see how far I can get with the existing system.  I'll try to do that as soon as I can.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]