speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Comments on the Text to Speech "algorithm"


From: Bill Cox
Subject: Comments on the Text to Speech "algorithm"
Date: Sun, 28 Feb 2010 04:58:33 -0500

I much prefer the formant synthesis voices at high speed.  The voxin
ibmtts voice sounds clear well beyond the speed any other voice can
achieve, so far as I know.

However, talking about algorithms I know nothing about is a lot of fun!

I was also thinking about how easy whole-word encoding would be.  It's
totally trivial.  Even recording all the words shouldn't be all that
painful.  However, variable speed is essential.  You can do some
frequency shifting to increase the speed without increasing the pitch,
but I've not seen a very usable range from such algorithms, though I
don't understand why it's hard.  There's also the issue of guessing
how a word should sound from it's spelling.  Authors often make up new
words for names, or if they're trying to mimic an accent.  We have
good open-source algorithms in espeak for guessing sounds, but the
voice synthesis needs to be able to convert the phonemes to speech.

I'm not very familiar with the algorithms involved in formant
synthesis.  Does anyone know any good papers?  I was thinking of
converting speech samples of all the phoneme transitions to the
frequency domain, and doing all the speech generation there.  That
simplifies smoothing transitions and playing at different speeds.  Are
these the kinds of tricks used in formant synthesis?  Also, I don't
understand why it's hard to get voices to play well at high speed.
Even espeak sucks at this.  For formant synthesis, shouldn't playing
fast without distortion be a trivial problem?

It kind of sucks, but most of the good work still seems to be
unpublished.  If you find anything fun or good, let me know.

Bill

On Sat, Feb 27, 2010 at 11:21 PM, A <avalon at friendofpooh.com> wrote:
> Let alone mp3 is a bad choice but why should file access be so bad? If
> windows file system can't keep up, then some way to bundle the files
> in a single (or few) data structure should do the trick. And if the
> speech engine starts playing the file as soon as there are enough bits
> to do so instead of reading the whole first.
> I think it's more a problem of latency optimizations rather than
> anything else on the current dual and multicore CPUs.
>
> On Sun, Feb 28, 2010 at 1:23 AM, Kenny Hitt <kenny at hittsjunk.net> wrote:
>> Hi. ?That would probably be ok for reading books, but it
>> would suck for a screen reader. ?One reason I haven't
>> used Cepstral Swift much even though I own several voices, is it's not
>> responsive enough for daily screen reading.
>> The file access alone for so many mp3s would be aweful.
>>
>> ? ? ? ? ?Kenny
>>
>> On Sat, Feb 27, 2010 at 10:47:13PM +0100, marc wrote:
>>> Hello,
>>>
>>> I made this remark at the http://rmll.info last summer in Nantes.
>>>
>>> I you have Text to Speech (TTS), the "old" way is to invent some
>>> mathematical function and to generate a "sound" which is "close" (in
>>> Hausdorf distance?) to the spoken words.
>>>
>>> But these mathematical formulas date from times when computers
>>> didn't have the possibilities to contain about 60.000 MP3s from a
>>> human speaker. If we could organise it that way, the concatanation
>>> of the words would be better than the mathematical contruction. ?And
>>> if you learned how to make a higher sound at the end of a question,
>>> you should be able to adapt the mp3 too.
>>>
>>> Problem is: we will have to throw away a lot of work by
>>> mathematicians... ?Mathematicians never had patents (the Greek would
>>> be rich ;-). ?But we throw away a lot of stuff in computer science
>>> ...
>>>
>>>
>>> Marc
>>>
>>>
>>>
>>>
>>> --
>>> What's on Shortwave guide: choose an hour, go!
>>> http://shortwave.tk
>>> 700+ Radio Stations on SW http://swstations.tk
>>> 300+ languages on SW http://radiolanguages.tk
>>>
>>> _______________________________________________
>>> Speechd mailing list
>>> Speechd at lists.freebsoft.org
>>> http://lists.freebsoft.org/mailman/listinfo/speechd
>>
>> _______________________________________________
>> Speechd mailing list
>> Speechd at lists.freebsoft.org
>> http://lists.freebsoft.org/mailman/listinfo/speechd
>>
>
> _______________________________________________
> Speechd mailing list
> Speechd at lists.freebsoft.org
> http://lists.freebsoft.org/mailman/listinfo/speechd
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]