Subject: Re: Real-time articulatory speech-synthesis-by-rules Hi Jonathan, Yes, indeed there has been work done since 1995. If you visit: http://savannah.gnu.og.projects/gnuspeech you should find various kinds of information, including access to the CVS repository where the Mac/GnuSpeech software is available under "current". Greg Casamento has been working on the additions to the Mac G5/xcode source to allow it to compile under GnuStep but I think there are still a few things to complete. The same source should compile and run on a Mac G5 running Panther or Tiger. The project home page has a digram showing an overview of the system. Using Google, you should have been able to find the links to this work fairly easily ("tube model articulatory speech synthesis distinctive region formant sensitivity analysis" -- some of these keywords usually bring up the necessary links within the first two or three Google references). The tube model itself exists as a C program, and, given the correct parameters will run and produce a .snd, .wav or .au file on any machine. Of course, the trick is producing the correct parameters, which is what the Monet system is all about. The current Monet system (which is what the available source effectively compiles to) is designed as an interactive tool for producing the databases needed to establish a language, and was ported from the original implementation on a NeXT. Both Mac xcode and GnuStep have a lot in common with the original NextStep/OpenStep that ran on the NeXT -- particularly the very powerful development environment including all kinds of Libraries for graphics & interaction, and the use of Objective-C. Recreating all the graphical stuff was done for the Mac, and is almost complete within GnuStep, so the port has been reasonably possible even within the limited programming effort resources available to us. Porting the whole graphical/interactive Monet to plain vanilla Linux would be more problematical (consider how long it has taken to get GnuStep up). However, if you simply want speech output, you don't actually need all the graphical/interactive stuff. On the NeXT there was a stripped down version of Monet called "real-time Monet" that used the databases and rules to process text into the parameters needed by the tube model (on the NeXT the tube ran on the built-in DSP [Digital Signal Processor] -- these days, the host processors are fast enough, and have additional instructions, so that a DSP is not necessary). Real-time Monet was the main part of what was effectively a daemon on the NeXT called the "Speech Server" that provided speech services to any program that needed them, and simply showed up as an item in the "Services" menu for any application. An applet called BigMouth (not that other speaking program of the same name!) used the Speech Server to allow users to play around with text-to-speech synthesis. There were also facilities to create User and Application Dictionaries that were in a hierarchy with the main dictionary, so that the Main dictionary could be modified for application or personal reasons. A tool called "PrEditor" allowed users to adjust pronunciations of individual words and put them in these dictionaries. There were also tools available for Developer and/or Experimenter use, the most important of which were Main dictionary development tools, a Server Test Plus program that gave full access to the Speech Server facilities, including the ability to obtain the phonetic translation of arbitrary input text; and an Applet called "Synthesizer" that gave full interactive graphical access to the tube model to allow experimenters (along with the full interactive Monet -- also originally restricted as part of the "Experimenters Kit") to investigate the tube model and the speech postures needed for various static sounds and "locii" as part of developing databases for different languages. Monet allowed the development of posture data, and dynamic composition data, plus some adjustment of intonation and rhythm. However intonation and rhythm were basically determined by models based on MAK Halliday's model of British intonation, plus rhythm data from the original authors' research at the U of Calgary. The intonation and rhythm were considered to be one of the great strengths of the system. With the exception of Monet (which does allow text to be converted to speech, and all the posture databases and rules to be manipulated) none of the other tools and facilities have yet been ported to the new platforms. One of the most urgent tasks is to re-create "real-time Monet" (RTM) in a form that can be run under plain vanilla Linux, as well as on the Mac by stripping the ported version of Monet -- maybe converting it to plain C. The databases already exist (though could be improved) so if RTM were created, speech services and general speech output could be provided for both Mac and Linux. It would be valuable also to port the other tools and facilities to the Mac (xcode) and GnuStep so that research and development on all aspects of the speech output, including the creation of databases for other languages than English, could proceed on modern platforms. It is for speech experimentation (for language development, phonetic experimentation, and psychophysical work) that the articulatory speech synthesis software we created is most important, however important and useful high quality speech output for computers may be. Future work will not only include the kinds of developments I've hinted at above, but also better models of sibilant sounds and larynx excitation by modelling the airflow characteristics of these sounds instead of arbitrarily injecting waveforms approximating what is observed in speech, plus getting a much tighter connection between the underlying speech gestures that create speech and the control of the tube model. We'd also like to generalise the frameworks used for the Halliday-based intonation and rhythm so that a basis for trying other approaches to intonation might be made easily available to the world. All the existing software is (as noted) available under a GPL (see: http://www.gnu.org for details). I hope this may be helpful. I would value your comments and would be very happy to answer any further questions you may have. We are *very* interested in obtaining help in further development of the gnuspeech project and wonder if you have the interest and skills to become involved? All good wishes. david - ------- David Hill, Prof. Emeritus, Computer Science | Imagination is more | U. Calgary, Calgary, AB, Canada T2N 1N4 | important than knowledge | address@hidden OR address@hidden | (Albert Einstein) | http://www.cpsc.ucalgary.ca/~hill | Kill your television | On Fri, 20 May 2005, Jonathan Schreiter wrote: > Hi, > I read your paper from 1995 titled "Real-time > articulatory speech-synthesis-by-rules". I am > interested in this area of research. I noticed that > the document stated the software would be available > via the GNU software website, but I was unable to find > it. Is it the software / ruleset database publicly > available? Has any updated work been done since 1995? > > Any help would be greatly appreciated. > > Many thanks, > Jonathan >