[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Speechd] KTTS and SpeechD integration
From: |
Hynek Hanke |
Subject: |
[Speechd] KTTS and SpeechD integration |
Date: |
Sun Aug 13 12:52:48 2006 |
Hi Milan,
> HH> 3) The TTS is able to start synthesizing the text from an
> HH> arbitrary index mark in the given (complete) text.
>
> I'm not sure what's the point here. I assume you don't think about just
> cutting out the before-mark prefix, you probably think about starting
> the synthesis considering the left context, e.g. correct pronunciation
> ("I will <mark> read" vs. "I have <mark> read"), intonation dependent on
> the previous contents, or considering surrounding SSML markup.
you understood it correctly. I don't think considering pronounciation and
intonation context is that important, especially because (except for special
cases), this kind of index marks will mostly lie in between sentences, not
inside them. But considering surrounding SSML markup is very important, because
that can make a big difference in the resulting speech.
Currently, what I do in Speech Dispatcher to ensure pause/resume or rewinding,
is that I cut the message at the appropriate index mark and send to Festival
only the rest. This is very wrong, because it can even break the SSML entirely
(make it invalid) if the index mark happens to be at some bad place. Fixing
this is of quite high importance, I think. I didn't think about this problem
when we made the switch to SSML.
> I think this is a legitimate requirement on a TTS system.
I think it should also be somehow included in the TTS API specs discussed on
freedesktop.org. The point 4.7 (NICE TO HAVE) is somehow similar to this
requirement, although a bit more general.
The problem with this is that if the synthesizer supports SSML and doesn't
support the feature described above, then the applications are forced to parse
the SSML if they want to do pause/resume without blocking the whole session
with the TTS (that's what the pause/resume functions in 4.12 do, if I
understand correctly).
I propose to prepend a SHOULD HAVE point requiring the TTS to at least be able
to start reading from a specified index mark, as described above.
> But retaining the complete left context means processing it, which may mean
> the synthesizer doesn't start returning the output "immediately".
If this turns out to be a problem, we could look for alternative ways how to
ensure pause/resume/rewinding, like the point 4.7 of TTS API requirements
(although this point is not complete for that purpose). But I expect it would
be more difficult.
> What you say here is probably somewhat confused.
Of course your explanations of the screen/application reader are more
acurate. My point was just to explain that generally a reader like Gnopernicus
won't be the only application that needs it's speech to get through so that
accessibility is ensured. There even might be more important messages that what
gets out of Gnopernicus or speechd-el.
A user probably doesn't want the message ``You have an incomming call from
Milan Zamazal'' be interrupted or thrown away by messages caused by him writing
a letter in a text editor.
So I don't think screen readers or application readers should be, by default,
in power over everything.
With Regards,
Hynek Hanke