Hi Ken,On Oct 24, 2010, at 5:52 PM, Kenneth Reid Beesley wrote: On 22Oct2010, at 15:46, David Hill wrote:
Dear Ken,
Apologies for the delay in replying to you email request.
Hello David,
Many thanks for your message. No apologies are necessary. As far as I know, you don't owe anything to anyone---and certainly not to me. About once a year I look around for a practical articulatory text-to-speech toolkit, complete, supported, documented and ready to use. I know that's asking a lot. From what I've been able to read, gunspeech looks very promising, so I launch an inquiry from time to time. I hope they come across as friendly inquiries.
An enquiry about gnuspeech cannot be anything other than friendly. Your interest is appreciated!
The project at present is not being actively developed. The last repository update was made approximately 11 months ago by Dalmazio Brisinda and included major upgrades to Monet and a newly completed component required (or at least very, very useful) for new language development -- "Synthesizer" which allows researchers to determine the acoustic consequences of arbitrary vocal tract configurations in the tube model (though the current "Synthesizer" is really a beta release and needs some code clean-up and a few additional features). There are a number of additional important "Monet" modules to be ported to manage posture, rule and transition editing. Unfortunately for you, the sub-modules that deal with developing these necessary new language data components within Monet are only stubs at present. Some help in continuing the port would be most welcome,though I realise that you represent an end user, and not a developer, so this is not really a call on you. [Anyone out there interested?]
Thanks for the update. What kind of expertise do you need to continue development?
I think Dalmazio gave you a good short overview of that. If you are a Mac guy and have done any work with Cocoa (Interface Builder, and xcode) and know C you should be in pretty good shape, apart from getting up to speed on the model basis for the parameter generation (it's all there in the NeXT code in the repository of course, but reading code is no a lot of fun, especially you don't already know what it is doing).
However, there are some papers available for download from my university web site (Section F. of "Published papers" at http://pages.cpsc.ucalgary.ca/~hill). I've attached is an HTML copy of the relevant section (F) from the "Published papers" page at the site and you should find the links work to grab a copy of any paper you'd like to see, for those papers that are downloadable.
To understand the basis for the synthesis, it is probably worth reading my 1978 paper first. It formed the basis for implementing the major component in Monet and the TextToSpeech Server, providing a desynchronized framework for manipulating the needed parameter tracks based on databases representing each language. It refers to hardware synthesizers, but these days synthesizers are run as software, thanks to phenomenal processor speed. We are using a synthesizer known as "the tube model" which is simply a full emulation of the branched acoustic tube that forms the vocal apparatus. The AVIOS paper (1995) explains how the tube model works. There are several papers related to the intonation and rhythm research for English on which the rhythm and intonation modelling of the synthesizer are based. The basic research is completely described in the 1977 ASA Meeting paper, but that is not yet on line. My 1979 and 1992 papers on the topic give you an idea of what we found, but I suspect the intonation and rhythm for a native American Indian language would be quite a bit different, though you may already have collected some data that could be used for the modelling. We actually carried out some experiments to test the "goodness" of our intonation and rhythm modelling, as derived from the data we collected from earlier analyses.
The detailed empirical process of using Monet, Synthesizer, and a good quality spectrograph, to create the databases was surprisingly easy. I really need to write a paper or manual to explain it in detail, but if you understand the target transition model of speech (a good bit different to the traditional segmental approach, especially given the desynchronisation involved in the synthesis model), the process is pretty obvious and straightforward, though you need to have access to a corpus of data on the sounds of the language you wish to create. English has been so well studied for years that the real targets for vowels and some consonants, the virtual targets for stops sounds, the noise characteristics of sibilants, and so on, were readily available. This is a separate issue from the rhythm and intonation. We created our own model for British English rhythm based on the analysis of a significant body of both formal and conversational British English speech. The intonation model is essentially that of M.A.K. Halliday and David Abercrombie -- well described in Halliday's book:"A Course in Spoken English Intonation" (OUP 1970, SBN 19 453066 3) which came with audio tapes, but is no longer in print. I should get onto OUP and see if they'll let me put the audio on my web site. We used some of the speech from those tapes as the basis for our rhythm study. But we did some experiments to test the model and added some refinements of our own.
These are the kinds of resources and approaches you'll need to create a synthesis system for a native American Indian language.
Hope this helps.
Warm regards.
david
----------
F. gnuspeech-related publications This section collects together those publications of particular relevance to the GNU Project "gnuspeech"—a system designed make it easy to create the databases for real-time articulatory speech synthesis for arbitrary languages, and provide the real-time synthesis tools to use the resulting synthesis in applications. This is for convenience. They duplicate entries in the listings above. Those who would like to help with the port of the original completely functional system that ran on the NeXT computer are invited to contact Professor Hill. Much of the work is done and the real-time synthesis is available for testing. Some of the data-base creation parts are not yet completely ported. - HILL DR (2006) Manual for the Synthesizer application -- part of the GnuSpeech text-to-speech toolkit On-line manual relevant to the real-time articulatory-synthesis-based text-to-speech system described in the AVIOS 95 paper: "Real-time articulatory speech-synthesis-by-rules"
- HILL DR (1993,2004) MONET speech synthesis editing system manual (TextToSpeech Kit tool). On-line manual relevant to the real-time articulatory-synthesis-based text-to-speech system described in the AVIOS 95 paper: "Real-time articulatory speech-synthesis-by-rules"
- HILL DR, MANZARA L & C-R SCHOCK (1993, 2003) Pronunciation guide for TextToSpeech kit Pronunciation guide for Webster's, Trillium and IPA phonetic transcriptions.
- HILL DR (2001) A conceptionary for speech and hearing in the context of machines and experimentation This document is a considerably enlarged and revised version of Hill (1976b) below. It is designed as an educational and reference tool (Check the term "conceptionary" in the Wikipedia).
- HILL DR, MANZARA L & C-R SCHOCK (1995) Manual for the original NeXT Developer TextToSpeech kit
- HILL, D.R., MANZARA, L. & TAUBE-SCHOCK, C-R. (1995) Real-time articulatory speech-synthesis-by-rules. Proc. AVIOS '95 14th Annual International Voice Technologies Conf, San Jose, 12-14 September 1995, 27-44 (C)
- HILL, D.R., SCHOCK, C-R & MANZARA, L. (1992) Unrestricted text-to-speech revisited: rhythm and intonation. Proc. 2nd. Int. Conf. on Spoken Language Processing, Banff, Alberta, Canada, October 12th.-16th., 1219-1222 (C)
- JASSEM, W., HILL, D.R. & WITTEN, I.H. (1984) Isochrony in English speech: its statistical validity and linguistic relevance. Pattern, Process and Function in Discourse Phonology (collection ed. Davydd Gibbon), Berlin: de Gruyter, 203-225 (J)
- HILL, D.R., JASSEM, W. & WITTEN, I.H. (1979) A statistical approach to the problem of isochrony in spoken British English. Current Issues in Linguistic Theory 9 (eds. H. & P. Hollien), 285-294, Amsterdam: John Benjamins B.V. (J)[This paper first appeared as University of Calgary Computer Science Department "Yellow Series" report # 78/27/6]
- HILL, D.R. (1978) A program structure for event-based speech synthesis by rules within a flexible segmental framework. Int. J. Man-Machine Studies 10 (3), 285-294, May (J)
- HILL, D.R. & REID, N.A. (1977a) An experiment on the perception of intonational features. Int. J. Man-Machine Studies 9 (2), 337-347 (J)
- HILL, D.R. (1977) Some results from a preliminary study of British English speech rhythm. 94th. Meeting of the Acoustical Society of America, Miami, Dec 12-16 (Full text available as U of Calgary Computer Science Dept. Report 78/26/5, contact the author; soon to be available on-line) (R)
- HILL, D.R. (1975a) Avoiding segmentation in speech analysis: problems and benefits. Proc. 8th. Int. Cong. of Phonetic Sciences, Leeds, UK, Aug 17-23, paper 128 (C)
- HILL, D.R. (1975b) Computer models for synthesising British English rhythm and intonation. Proc. 8th. Int. Cong. of Phonetic Sciences, Leeds, UK, Aug 17-23, paper 129 (C)
The existing code-base and software components are quite stable for both Mac OS X and GNUstep. Have you tried any of them out?
Many thanks for your interest. I hope to have better news for you "real soon now".
I'm mostly a Mac guy, so I really should see how far I can get with the existing system. I'll try to do that as soon as I can.
[snip]
|