[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Speech Dispatcher roadmap discussion.
From: |
Bohdan R . Rau |
Subject: |
Speech Dispatcher roadmap discussion. |
Date: |
Wed, 15 Oct 2014 12:33:39 +0200 |
W dniu 2014-10-15 03:40, Trevor Saunders napisa?(a):
> On Mon, Oct 13, 2014 at 10:45:05AM +0200, Bohdan R. Rau wrote:
>>
>> COMPAT_MODE On|Off
>
> I don't really like on and off since it assumes we'll only change the
> protocol once.
It was only suggestion - for example there may be command like:
PROTOCOL <number>
But I think there will be only one protocol change, next changes and
protocols would be obtained with CAPABILITY command.
>
> we can add functions spd_char_msgid etc which seems simpler to
> explain.
If we assume the new protocol will be used in new applications - I see
no reason to add new function if application (and library) always knows
which version of protocol we use.
>
> btw why is spd_wchar a thing at all :( it seems like spd_char should
> handle UTF-8 fine.
Of course - spd_char works fine (with some exceptions). But spd_wchar
has nothing to UTF-8, it's used for direct Unicode codes, not for
encoded strings. As for me - spd_char function should be realized as
wrapper to spd_wchar, something like:
int spd_char(SPDConnection *conn, char *str)
{
int chr=get_unicode_character(str);
if (chr < 0) return -1;
return spd_wchar(conn,chr);
}
Why? Because some modules may be inconsistent with the documentation.
In theory we could put string of any length to spd_char and only first
character will be said. In fact, espeak module says "null" if string is
longer than one UTF-8 character.
But as spd_wchar seems to be completely broken today - it's theme for
future discussion.
>> Also, there must be functions like:
>>
>> SPD_Callback *spd_register_callback(SPDConnection *conn, int event,
>> SPD_Callback *callback, void *user_data);
>> SPD_Callback *spd_unregister_callback(SPDConnection *conn,int
>> event);
>>
>> Of course this function is valid only in no-compatibility mode!
>
> Well, you can only call it if you assume newer libspeechd than we
> have
> today so I'm not sure what the point of caring about a compatibility
> on
> vs off is.
Have you even see application without bugs? :)
>
>> 3. Module output capabilities
>>
>> SPEAK - module can speak
>> FETCH - module can return synthesized wave to server
>> FILE - module can save synthesized wave to file
>
> the second two are basically indistinguishable, so why have both?
Please be patient and wait for second part - I'll explain with details
why.
>
>> 4. Module input capabilities
>>
>> SSML - module can fully play with SSML and index marks;
>> FLAT - module translates internally SSML into plain text. Index mark
>> are
>> lost, pause/resume are not implemented.
>> PLAIN - module understands plain text (no SSML). Extra features
>> (like
>> AUTOPAUSE and AUTOSYNC) are possible only in this mode.
>
> I'm not sure what the point in distinguishing between flat and plain
> is,
> any module can rip out all the ssml bits.
Because in FLAT mode string sent to module may be different than string
sent to speech-dispatcher by application. So offsets returned by
AUTOPAUSE and AUTOSYNC will be completely unusable.
> Though maybe
> it makes sense to tell clients if a module can deal with ssml or not
> I'm
> not really sure.
Yes. But if module has extra features usable only in PLAIN mode,
application should have this information.
>> Server should never internally encode plain text into SSML if module
>> reports
>> PLAIN and any of extra features (AUTOPAUSE, AUTOSYNC etc.) is
>> enabled. Also,
>> server should never accept SSML data from application if extra
>> features are
>> enabled (it's application bug).
>
> why?
Because requesting features which are known not possible it's bug - or
we have different ideas what is bug :)
>
>> 5. Module extended capabilities:
>>
>> SYNC - valid only in SSML mode. 706 SYNCHRONIZED events will be
>> fired only
>> if SYNC mode is enabled.
>>
>> AUTOSYNC - valid only in PLAIN mode. 707 SYNCHRONIZED event will be
>> fired
>> only if AUTOSYNC mode is enabled. Requires simple NLP in module.
>
> these events are different how?
Both are intended for applications which needs information, which part
of text is actually spoken. SYNC works in SSML mode and uses predefined
(by application) index mark. AUTOSYNC works in PLAIN mode, and returns
offsets, which may be used for example to highlight spoken text.
Example of application: multi-language epub reader. Application has
only vague idea where is end of sentence, and if module (specialized for
particular language) knows better - why not use it's knowledge?
>> Simple NLP (Natural Language Processor) must be able to
>> automatically split
>> given text into sentences (or - if synthesizer can speak also parts
>> of
>> sentences - phrases).
> I'm unconvinced, it seems like that's a problem synthesizer should
> already be solving, so why should we duplicate that?
Because synthesizers are for synthesis, not for dealing with gramatic
problems.
Example: Mbrola is synthesizer (ie. Mbrola realizes DSP phase of TTS).
Of course - most synthesizers has some internal NLP, but it's used only
for internal synthesizer's purposes. My Milena is exception, it uses
something like:
while (*input_string) {
char *sentence = get_sentence(&input_string);
say(sentence);
free(sentence);
}
So it's possible to get currently spoken sentence position from Milena,
and we can use it to highlight spoken text or to determine byte offset
where speech was paused.
But Milena is not synthesizer - in fact it's text-to-speech system with
sophisticated NLP specialized for only one language, and backend
synthesizers may be different (currently Mbrola and Ivona are
implemented).
I know my suggestions may be a little strange, but you have to take
into account that I want to change the way of thinking about
speech-dispatcher.
Currently:
speech-dispatcher is used by visual impaired users, and as speech
backend for screenreaders.
My dream:
Visual impaired users are very important for speech-dispatcher
developers, but speech-dispatcher should also be used as general purpose
speech synthesis backend for different applications (like SAPI in
Windows). Screenreader is example of very important application, but
it's not only application using speech-dispatcher.
Example:
Imagine well-sighted eighteenwheeler driver, who carries several cases
of beer from Dallas to New Orleans reading the long email sent to him by
his fiancee :)
> Trev
ethanak
- Speech Dispatcher roadmap discussion., (continued)
Speech Dispatcher roadmap discussion., Tomáš Cerha, 2014/10/09
Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/09
Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/16
Speech Dispatcher roadmap discussion., Luke Yelavich, 2014/10/21
Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/12
Speech Dispatcher roadmap discussion., Trevor Saunders, 2014/10/09