[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Speech Dispatcher roadmap discussion.
From: |
Bohdan R . Rau |
Subject: |
Speech Dispatcher roadmap discussion. |
Date: |
Mon, 13 Oct 2014 10:45:05 +0200 |
W dniu 2014-10-10 02:13, Luke Yelavich napisa?(a):
> On Thu, Oct 09, 2014 at 10:50:35PM AEDT, Bohdan R. Rau wrote:
>>
>> I also have another suggestions, but it's topic for next mail :)
>
> Looking forward to hearing about it.
PART ONE
At first:
as some of SSIP responses may be changed in new version, we should
provide compatibility with current versions of speech-dispatcher (both
in SSIP protocol and libspeechd). So first step would be new SSIP
command, something like:
COMPAT_MODE On|Off
(default is On - it means, server emulates current version of SSIP
protocol even with known bugs)
It should be safe - old applications will still work with new version,
new applications would decide to continue if COMPAT_MODE command is not
implemented (ie. we are connected to old version of speech-dispatcher)
or die (because some of vital functions are not implemented).
Also new libspeechd should have function:
int spd_compat_mode(SPDConnection *conn, int compatible);
Function should return 0 at success or -1 if error occurs.
Alternatively - we could provide spd_open_new and spd_open2_new
functions. In future applications we would use only _new functions.
1. Extending current event notifications
There is nothing new in protocol. Simply after commands CHAR, KEY and
SOUND_ICON server should answer identically as after SPEAK, ie.:
> CHAR x
< 225-msg_id
< 225 OK MESSAGE_QUEUED
Analogically library functions spd_key, spd_char, spd_wchar and
spd_sound_icon should return message id - but in compatibility mode they
must return zero on success, because of possible code:
if (spd_char(conn,character)) {
error();
}
2. New events
SYNC:
706-msg_id
706-client_id
706-start_index
706-end_index
706 SYNCHRONIZED
Event is fired in SSML mode, when module has SYNC mode enabled. Is
similar to INDEX_MARK, but returns pair of index mark names (or empty
string at start at end of text). It may be usable for application
highlighting currently spoken text (book readers or applications for
people with dyslexia). Both index name are used, because module may
ignore some marks.
If SYNC mode in module is disabled (for example module has no SYNC
capability), event must be fired with empty start_index and end_index
name.
Alternatively, there may be reserved mark names - for example
"__begin__" and "__end__".
AUTOSYNC:
707-msg_id
707-client_id
707-start_offset-end_offset
707 SYNCHRONIZED
Event fired in TEXT mode when module has AUTOSYNC mode enabled. It is
similar to SYNC event, but relies on module capability of splitting
spoken text into smaller parts. Returns offsets (in bytes) of begin/end
of spoken text from start of message given in SPEAK command.
If AUTOSYNC mode of module is disabled (for example module has no
AUTOSYNC capability), event must be fired with 0 as
AUTOPAUSE:
708-msg_id
708-client_id
708-offset
708 STOPPED
Event fired in TEXT mode when module has AUTOPAUSE mode enabled and we
explicitly require autopause response from server with:
SET self AUTOPAUSE On
Returns length of spoken part of text (in bytes).
If we don't require AUTOPAUSE, server should automatically use
AUTOPAUSE response of module, store internally remaining part of text
and respond 704 PAUSED to client.
MOUTH:
709-msg_id
709-client_id
709-width-height
709 MOUTH
Event fired, when for example graphical application should redraw mouth
of shown face. Width and height are given in range 0..100.
Todays modules have no idea about mouth shapes, but as I know it's
possible. Module must have MOUTH mode enabled.
In libspeechd it's necessary to rewrite callback system. My suggestion:
typedef void (*SPD_Callback)(int msg, int id, int event, void
*user_data,...);
and retrieve values with vararg.
For example:
INDEX_MARK - one value of char *
SYNC - two values of char *
AUTOSYNC - two integers
MOUTH - two integers
AUTOPAUSE - one integer
Also, there must be functions like:
SPD_Callback *spd_register_callback(SPDConnection *conn, int event,
SPD_Callback *callback, void *user_data);
SPD_Callback *spd_unregister_callback(SPDConnection *conn,int event);
Of course this function is valid only in no-compatibility mode!
3. Module output capabilities
SPEAK - module can speak
FETCH - module can return synthesized wave to server
FILE - module can save synthesized wave to file
4. Module input capabilities
SSML - module can fully play with SSML and index marks;
FLAT - module translates internally SSML into plain text. Index mark
are lost, pause/resume are not implemented.
PLAIN - module understands plain text (no SSML). Extra features (like
AUTOPAUSE and AUTOSYNC) are possible only in this mode.
FLAT and SSML capabilities are mutually exclusive.
Server should never send SSML data for module reporting only PLAIN
capability.
Server should always send SSML data when module does not reports PLAIN.
Server should never internally encode plain text into SSML if module
reports PLAIN and any of extra features (AUTOPAUSE, AUTOSYNC etc.) is
enabled. Also, server should never accept SSML data from application if
extra features are enabled (it's application bug).
5. Module extended capabilities:
SYNC - valid only in SSML mode. 706 SYNCHRONIZED events will be fired
only if SYNC mode is enabled.
AUTOSYNC - valid only in PLAIN mode. 707 SYNCHRONIZED event will be
fired only if AUTOSYNC mode is enabled. Requires simple NLP in module.
WORDSYNC - valid only in PLAIN mode. 707 SYNCHRONIZED event will be
fired at word boundaries instead of phrase/sentence boundariesif
WORDSYNC mode is enabled.
AUTOPAUSE - valid only in PLAIN mode. 708 STOPPED event will be fired
only if AUTOPAUSE mode is enabled. This mode may be turned on
automatically by server responding 704 PAUSED to client. Requires simple
NLP in module.
SPEECH_MOUTH - module can fire 709 MOUTH events during speaking.
FETCH_MOUTH - module can return to server mouth shapes together with
speech wave
FILE_MOUTH - module can save mouth shapes together with speech wave
MOUTH capabilities are separated, because it's relatively simple to add
MOUTH events to Mbrola based module in FETCH/FILE mode, but
synchronization in realtime may be difficult.
Simple NLP (Natural Language Processor) must be able to automatically
split given text into sentences (or - if synthesizer can speak also
parts of sentences - phrases). It may be trivial (as splitting after
each dot, exclamation or question followed by space) or more
sophisticated (as phraser in Milena NLP, which understands context of
dots and won't split after abbreviation or positional number in Polish
text). Trivial NLP should be part of projected library for
speech-dispatcher modules.
More about FETCH/FILE modes in next mail :)
ethanak
--
http://milena.polip.com/ - Pa pa, Ivonko!
- Speech Dispatcher roadmap discussion., Luke Yelavich, 2014/10/08
- Speech Dispatcher roadmap discussion., Tomáš Cerha, 2014/10/09
- Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/09
- Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/16
- Speech Dispatcher roadmap discussion., Luke Yelavich, 2014/10/21
Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/12
Speech Dispatcher roadmap discussion., Trevor Saunders, 2014/10/09