|
From: | David Hill |
Subject: | Re: [gnuspeech-contact] GNUSpeech Console Utility |
Date: | Thu, 5 Nov 2009 12:29:22 -0800 |
Hi John, The original text-to-speech system on the NeXT, on which the port is based, did address the "question" intonation pattern. The intonation patterns are affected by the punctuation and intonation control parameters. But, properly, only questions expecting the answer "Yes" or No", or statements expressing uncertainty that really have rising intonation at the end. The rampant "up-talk" by the younger generation in Canada is an exception -- everything in "up-talk" gets a rising intonation at the end, perhaps a sign of insecurity in the speaker! :-). Wh- questions don't show the rising intonation. The system did not make allowance for this distinction -- it would have required some grammatical analysis which we had not tackled, but it should be. It isn't just a matter of detecting the presence of words like "why", "when" "who", what", and "how" because it is fairly easy to frame a "Yes/No" question that also contains one or more of these words (for example: "Did you tell her when we were supposed to meet?"). The system also had regular statements and emphatic statements. There should have been a lot more, and the plan was to implement the whole of Michael Halliday's description of the intonation of British English (he wrote an excellent tutorial book, with accompanying taped examples: A course in spoken English: Intonation" -- Oxford U. Press 1970 SBN [sic] 19 453066 3). The intonation system was tied to the metrical aspects of English described by a number of British linguists -- most notably Professor David Abercrombie who was at Edinburgh university. We carried out significant research at the U of Calgary on the rhythm and intonation of British English and this was used when we spun off Trillium Sound Research and built the original NeXT system. The rhythm and intonation were regarded as significantly effective features of the text-to-speech system, even though the research results and Halliday were only partially implemented. The speech was found to be much less tiring to listen to for long periods than, for example, DECTalk (which was based on MITalk developed at MIT: "From text to speech: the MITalk system," Allen, Hunnicutt & Klatt, Cambridge University Press, 1987 ISBN 0-521-30641-8) Abercrombie's claim was that spoken British English had "a tendency towards isochrony". Specifically, spoken phrases and sentences could be split into "feet", rather like the bars in music, and the rhythmic "beat" falls on the first syllable of this unit (the stressed syllables dictate where the foot boundaries fall). A tendency towards isochrony then asserts that the beats fall at more regular intervals than would be expected from the differing number of syllables in each foot, and this is because the length of the syllables becomes shorter as the number increase. American linguists are skeptical about this idea but our analyses of a corpus of English spoken for purposes of illustrating intonation revealed that such a tendency definitely exists. You'd think it was an easy enough question to resolve one way or the other, but if you think this you don't know linguists! :-) There are several descriptions of the rhythm work we did. The most complete one, though very academic, is: but there is a shorter version that summarises the actual research data: HILL, D.R., WITTEN I.H. and Jassem, W. (1977) Some results from a preliminary study of British English speech rhythm which was presented at 94th. Meeting of the Acoustical Society of America, Miami, Dec 12-16 but only appears as a summary in the proceedings. The full text available as U of Calgary Computer Science Dept. Report 78/26/5 I could send you a draft electronic copy as I am currently working on putting a copy on the web but there's also a hard copy version published as a departmental report. The intonation work is best accessed through Halliday's book though Craig Taube-Schock's thesis (for which he received the Governor General of Canada's Gold Medal) reports the initial experimental work we did to validate and extend Halliday's descriptions for purposes of computer speech intonation: "Synthesizing intonation for computer speech output" Craig-Richard Taube-Schock. M.Sc. Thesis, Department of Computer Science, The University of Calgary 1993, 109 pages. It is available from Proquest (who archive all university theses in North America) though they have the date as 1994. In implementing the intonation for the TextToSpeech kit, a number of improvements were made that are not written up in the thesis, especially the smoothing of contours. From the original Developer TextToSpeech kit manual: ...
A question mark at the end of a sentence caused the rising intonation of a question to be selected. Another special mode allowed punctuation to be spoken, rather than used to control how the text was spoken. I have put the whole manual on my university web site where it is easier to find than digging through the savannah repository, though it doesn't really address these issues completely (but is useful for many purposes, and you will find it useful background). Go to: Select "Published papers" from the left-hand menu, scroll down to section "E. Other publications" and you'll find a whole lot of Gnuspeech-related documents there. The sixth item is "Manual for the original NeXT Developer TextToSpeech kit". Clicking the link witll allow you to download a .pdf file of the whole manual. The five previous links in that section are also useful references for Gnuspeech and will help you in your work on porting the server. Many thanks for your willingness to get involved. Very much appreciated. Feel free to bug me with any questions/problems that come up. HTH. All good wishes. david --------- David Hill -------- The only function of economic forecasting is to make astrology look respectable. (J.K. Galbraith) --------On Nov 4, 2009, at 6:21 PM, John Delaney wrote: Here I was trying to implement a speech synthesis API for a graduate musical synthesis class, and now I'm getting roped into actually working on the project. I'll implement some sort of Parameter class to hold the current intonation parameters, that should be pretty simple. [snip] --------- |
[Prev in Thread] | Current Thread | [Next in Thread] |