gnuspeech-contact
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnuspeech-contact] Re: what's the status?


From: D.R. Hill
Subject: Re: [gnuspeech-contact] Re: what's the status?
Date: Tue, 14 Feb 2006 16:53:06 -0700 (MST)

First some background, then I'll answer your question.

There has been a hiatus since the software (which originally was commercial for the NeXT) was first available -- mainly because NeXT Computer became NeXT -- a software company based on Intel processors -- and the DSP needed for real-time support became very problematical (we used the Turtle Beach Multisound and had it all working but then that went away and there was no good substitute for a while and the team broke up and went each his separate way).

The decision was made to put all the software and development tools out under a GPL on the FSF savannah web site where it now exists as a GNU project (gnuspeech). At first GNUstep -- the GNU/Linux version of NeXTSTEP/OpenStep was incomplete. More recently Mac OS/X Cocoa became available and since GNUstep was (by mandate) intended to implement Cocoa as GPL-licensed software, we (specifically Steven Nygard, ex-OmniGroup) ported the original Monet system to Mac OS/X.

There are some other parts that need to be ported, "Synthesizer" and "real-time Monet" being the most important, but there's PrEditor (which allows users to create their own additional user and application dictionaries) and BigMouth (which provides a service to speak arbitrary files without requiring them to have been designed to use the system).

Also, the intonation is built into Monet and inaccessible at present.

This lack of further development is a result of the hiatus plus the need to port (the latter is more difficult than it seems, and Steve Nygard did a truly impressive job getting Monet across).

People's voices are characterised by their physical characteristics (e.g. length of vocal tract, degree of huskiness in the voice -- especially for females, and so on); by the way they speak (the precise vowel qualities involved, the mean pitch and pitch range, the style of intonation and so on). The pronunciation of words, which includes vowel quality, elision of consonants, stressing and so on is obviously also part of how people speak. The stressing affects the rhythm and intonation modellling too.

The tools we had allowed all of these issues to be addressed -- though as I say the intonation was built into Monet, rather than subject to rules. The rhythm can be varied in Monet, but ought to be more accessible, and the intonation should be treated as everything else is, by rules that are explicit and can be changed. This is one of the development priorities when the various ports are complete.

Real-time Monet (RTM) is a subset of the present interactive Monet (IM). IM is designed to allow the rules and posture data to be changed, and the effect of the changes listened to. RTM would simply accept punctuated text as input and produce speech output on the system sound output. "Synthesizer" (which I am surreently working on) allows the results of different postures applied to the Tube Resonance Model to be explored and is almost essential to creating new language databases. However, IM does allow the posture data to be varied and new data files of posture data and compounding rules to be created and saved for use by IM or RTM. The results are not likely to be satisfactory if you haven't checked the data using "Synthesizer" and preferably by reference to the linguistics/phonetics literature to get an idea of the results you are aiming at.

That seems like a long preamble to answering your question, but gives you the background. The short answer is that the system, when complete and preferably but not essentially with the intonation made more accessible, would allow arbitrary voices and accents to be created from almost nothing. As it is, child, female and male voices of each with some reasonable variation can be created. But producing, say, a "cockney" accent, or a Bronx accent would require further work on the posture data, the compounding rules, the dictionary and likely the rhythm/intonation.

The changes that can be made now are significant, and can be made easily and quickly/easily, but are not as dramatic/comprehensive as they could be in the future.

Does this give you a reasonable answer to your question? We are heading in the direction you want/need! If you have specific questions I'll try to answer them fully.

Anyone else got comments?  Len?  Craig? Steve?

All good wishes.

david
------
David Hill, Prof. Emeritus, Computer Science  |  Imagination is more       |
U. Calgary, Calgary, AB, Canada T2N 1N4       |  important than knowledge  |
address@hidden OR address@hidden   |         (Albert Einstein)  |
http://www.cpsc.ucalgary.ca/~hill             |  Kill your television      |

On Tue, 14 Feb 2006, Lachlan Stuart wrote:

After watching the list and working my way through reading the source with
relative enthusiasm, I found it interesting that most people who work on
GNUSpeech have noble motives.
My own interest in the project stems from my intent to become a game
programmer. As far as I know, GNUSpeech is the only TTS application that
(intends to) gives the ability to quickly change voices and accents without
having to possess the desired voice and spend half a day in a studio
recording diphones. My long-term goal is to get through the source and
remake as much of the project as possible in platform-independant
uniform-language(C++) code, but that is, of course, a very long term goal.

I was wondering if any attempts have already been made towards any of my
efforts?

Regards,
Lachlan Stuart





reply via email to

[Prev in Thread] Current Thread [Next in Thread]