Subject: Re: Real-time articulatory speech-synthesis-by-rules

Hi Jonathan,

Yes, indeed there has been work done since 1995.

If you visit:

 	http://savannah.gnu.og.projects/gnuspeech

you should find various kinds of information, including access to the CVS 
repository where the Mac/GnuSpeech software is available under "current". Greg 
Casamento has been working on the additions to the Mac G5/xcode source to allow 
it to compile under GnuStep but I think there are still a few things to 
complete.  The same source should compile and run on a Mac G5 running Panther 
or Tiger.  The project home page has a digram showing an overview of the 
system.

Using Google, you should have been able to find the links to this work fairly 
easily ("tube model articulatory speech synthesis distinctive region formant 
sensitivity analysis" -- some of these keywords usually bring up the necessary 
links within the first two or three Google references).

The tube model itself exists as a C program, and, given the correct parameters 
will run and produce a .snd, .wav or .au file on any machine.

Of course, the trick is producing the correct parameters, which is what the 
Monet system is all about.  The current Monet system (which is what the 
available source effectively compiles to) is designed as an interactive tool 
for producing the databases needed to establish a language, and was ported from 
the original implementation on a NeXT.  Both
Mac xcode and GnuStep have a lot in common with the original NextStep/OpenStep 
that ran on the NeXT -- particularly the very powerful development environment 
including all kinds of Libraries for graphics & interaction, and the use of 
Objective-C.  Recreating all the graphical stuff was done for the Mac, and is 
almost complete within GnuStep, so the port has been reasonably possible even 
within the limited programming effort resources available to us.  Porting the 
whole graphical/interactive Monet to plain vanilla Linux would be more 
problematical (consider how long it has taken to get GnuStep up).

However, if you simply want speech output, you don't actually need all the 
graphical/interactive stuff.  On the NeXT there was a stripped down version of 
Monet called "real-time Monet" that used the databases and rules to process 
text into the parameters needed by the tube model (on the NeXT the tube ran on 
the built-in DSP [Digital Signal Processor] -- these days, the host processors 
are fast enough, and have additional instructions, so that a DSP is not 
necessary).  Real-time Monet was the main part of what was effectively a daemon 
on the NeXT called the "Speech Server" that provided speech services to any 
program that needed them, and simply showed up as an item in the "Services" 
menu for any application. An applet called BigMouth (not that other speaking 
program of the same name!) used the Speech Server to allow users to play around 
with text-to-speech synthesis.  There were also facilities to create User and 
Application Dictionaries that were in a hierarchy with the main dictionary, so 
that the Main dictionary could be modified for application or personal reasons. 
A tool called "PrEditor" allowed users to adjust pronunciations of individual 
words and put them in these dictionaries. There were also tools available for 
Developer and/or Experimenter use, the most important of which were Main 
dictionary development tools, a Server Test Plus program that gave full access 
to the Speech Server facilities, including the ability to obtain the phonetic 
translation of arbitrary input text; and an Applet called "Synthesizer" that 
gave full interactive graphical access to the tube model to allow experimenters 
(along with the full interactive Monet -- also originally restricted as part of 
the "Experimenters Kit") to investigate the tube model and the speech postures 
needed for various static sounds and "locii" as part of developing databases 
for different languages.  Monet allowed the development of posture data, and 
dynamic composition data, plus some adjustment of intonation and rhythm. 
However intonation and rhythm were basically determined by models based on MAK 
Halliday's model of British intonation, plus rhythm data from the original 
authors' research at the U of Calgary. The intonation and rhythm were 
considered to be one of the great strengths
of the system.  With the exception of Monet (which does allow text to be 
converted to speech, and all the posture databases and rules to be manipulated) 
none of the other tools and facilities have yet been ported to the new 
platforms.

One of the most urgent tasks is to re-create "real-time Monet" (RTM) in a form 
that can be run under plain vanilla Linux, as well as on the Mac by stripping 
the ported version of Monet -- maybe converting it to plain C. The databases 
already exist (though could be improved) so if RTM were created, speech 
services and general speech output could be provided for both Mac and Linux. 
It would be valuable also to port the other tools and facilities to the Mac 
(xcode) and GnuStep so that research and development on all aspects of the 
speech output, including the creation of databases for other languages than 
English, could proceed on modern platforms.

It is for speech experimentation (for language development, phonetic 
experimentation, and psychophysical work) that the articulatory speech 
synthesis software we created is most important, however important and useful 
high quality speech output for computers may be.

Future work will not only include the kinds of developments I've hinted at 
above, but also better models of sibilant sounds and larynx excitation by 
modelling the airflow characteristics of these sounds instead of arbitrarily 
injecting waveforms approximating what is observed in speech, plus getting a 
much tighter connection between the underlying speech gestures that create 
speech and the control of the tube model.  We'd also like to generalise the 
frameworks used for the Halliday-based intonation and rhythm so that a basis 
for trying other approaches to intonation might be made easily available to the 
world.

All the existing software is (as noted) available under a GPL (see:
http://www.gnu.org for details).

I hope this may be helpful.  I would value your comments and would be very 
happy to answer any further questions you may have.  We are *very* interested 
in obtaining help in further development of the gnuspeech project and wonder if 
you have the interest and skills to become involved?

All good wishes.

david

- -------

David Hill, Prof. Emeritus, Computer Science  |  Imagination is more       |
U. Calgary, Calgary, AB, Canada T2N 1N4       |  important than knowledge  |
address@hidden OR address@hidden   |         (Albert Einstein)  |
http://www.cpsc.ucalgary.ca/~hill             |  Kill your television      |

On Fri, 20 May 2005, Jonathan Schreiter wrote:

> Hi,
> I read your paper from 1995 titled "Real-time
> articulatory speech-synthesis-by-rules".  I am
> interested in this area of research.  I noticed that
> the document stated the software would be available
> via the GNU software website, but I was unable to find
> it.  Is it the software / ruleset database publicly
> available?  Has any updated work been done since 1995?
> 
> Any help would be greatly appreciated.
> 
> Many thanks,
> Jonathan
>