[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Speech Dispatcher roadmap discussion.
From: |
Bohdan R . Rau |
Subject: |
Speech Dispatcher roadmap discussion. |
Date: |
Thu, 16 Oct 2014 15:17:23 +0200 |
W dniu 2014-10-13 10:45, Bohdan R. Rau napisa?(a):
> PART ONE
PART TWO
1. Introducing private sessions
As speech-dispatcher would be used to retrieve speech waveforms for
different purposes - it may interfere with normal sessions. Fetching (or
storing) data must be uninterruptable by others (so commands like CANCEL
ALL won't cancel generating data for fetch). In other side - long part
of text sent to speech-dispatcher may cause long lags in applications
like screenreader or speech notification - synthesizer may be slow, and
synthesizing 10 minutes of speach may lock these synthesizers (like
Ivona) for minute.
So there should be possible to start speech-dispatcher in private mode
- for example with "--private" command line parameter. For me better
solution is create softlink to different name (for example
/usr/bin/speechd-private or similar) - it make possible to use pkill or
killall commands to kill only private or only standard sessions.
In private mode speech-dispatcher reads from stdin and writes to
stdout. All logs are redirected to stderr, pid files are not used.
Closing any of stdin/stdout streams causes speech-dispatcher to quit
immediately.
In private mode all speech-synthesis commands must be prefixed with
FILE or FETCH. Also speech-dispatcher answering for CAPA command must
not announce SPEECH capability.
As not all modules are able to produce waveforms, I propose new
AddPrivateModule line in configuration file. AddPrivateModule lines are
ignored in standard mode. If AddPrivateModule line is found in private
mode, AddModule lines are ignored.
If possible, in private mode no modules are initialized at start (in
almost all cases we will use only single module per session). Instead
modules will be initialized on demand. One exception - if there is only
one AddPrivateModule in configuration, this module may be initialized at
start (because we have no choice).
By the way - there should be possible to quit speech-dispatcher
automatically when client closes connection. For example: if I have
speech enabled in login manager - after succeful login Orca quits, but
speech-dispatcher still seats on socket, consumes memory and waits for
system shutdown. I see no reason to miss a kilobyte of memory for the
program I do not use.
2. FILE prefix
Each speech-synthesis command (SPEAK, CHAR etc.) may be prefixed by
FILE command followed by filename. Filename should be absolute path, or
will be relative to home directory of user running speech-dispatcher.
Server responses:
2xx OK FILE STORED
or:
5xx FILE ERROR:<errno>:<error string>
This is simplest way to create prerecorded waves for various
applications. I personally used prerecorded hours and minutes in my
Symbian audiobook player as talking clock.
We must distinguish between FILE capability of server and module. Even
if module has only FETCH capability (no FILE), server must fetch data
from module and store it in file. So if module has only FETCH
capability, for CAPA command server must response with both FETCH and
FILE.
Format of file depends on module internal capabilities, but in any case
WAV format must be accepted. Other formats (like mp3 or ogg) would be
recognized by filename extension, but it's no warranty.
There is one exception (derived from Mbrola).
If filename has form "-" (dash), possibly followed by extension, server
should return file content of file on listening socket, for example:
FILE -.wav CHAR a
2xx-OK FILE DATA FOLLOWS:<data length>
<binary data followed by LF>
200 OK
It may be usable, if we are connected to speech-dispatcher on distant
machine.
3. FETCH (server-module protocol).
As we can use FETCH command for different purposes, in server-module
communication FETCH has extra parameter: FILE or REALTIME. Depending on
internal module capabilities this parameter may be ignored, so server
should accept both responses.
In FILE mode, module responses simply:
2xx-OK FILE DATA FOLLOWS:<data length>
<binary data followed by LF>
2xx OK
or:
2yy-OK RAW DATA FOLLOWS:<data length>:<format specification>
<binary data followed by LF>
2yy OK
In REALTIME mode module always sends chunks of raw data, probably
interleaved with SYNC, AUTOSYNC and MOUTH responses. Typical response:
2zz-OK CHUNKED DATA FOLLOWS:<format specification>
2zz-CHUNK:<chunk length>
<binary data followed by LF>
...
2zz OK
SYNC, AUTOSYNC and MOUTH are sent before any chunk (if needed).
Chunks are sent as fast as possible. Especially - if synthesizer can
produce wave in realtime (like Mbrola, eSpeak or linux versions of
Ivona), module must send synthesized wave in small parts.
REALTIME mode may be used by application for its internal purposes - in
this case data sent by module are simply received by server and sent to
application as fast as possible. But primary goal is to play received
waveform by internal server audio system. We will be able to create
modules completely audio-system independent.
Server must be able to convert chunked data (if module responses
realtime on file request) into file data requested by application as
answer to FILE prefixed command.
4. FETCH (SSIP protocol)
FETCH prefix is intended to use by applications having its own audio
systems. Good example is subtitle reader - each dialogue is prefetched,
possibly postprocessed and played synchronously with original audio
track of watched movie. Another example - robot driven by external linux
box, with internal speaker and mouth driven by small servomotors.
Command prefixed by FETCH always returns raw data, but in two forms:
realtime and all. So FETCH prefix may be followed by ALL parameter.
In realtime (default) mode application expect data format similar to
FETCH REALTIME module command. If module answers with realtime response,
it's simply copied from module to client. If module responses as FILE,
server must convert this response into realtime, adding necessary SYNC
(begin-end) or AUTOSYNC (0-length) lines. Similar - if module has only
FILE capability (no FETCH), server must convert this response into
realtime. So our robot will talk...
Response from server should be same as response from module in
REALTIME.
If FETCH prefix is followed by ALL - application expects synthesized
waveform of full text in one big chunk in raw format. SYNC or MOUTH
should be removed by server. Even if module has only FILE capability (no
FETCH), server must convert temporary file into required response. So
our subtitle reader will work...
Response from server must be like:
2yy-RAW DATA FOLLOWS:<data length>:<format specification>
<binary data followed by LF>
2yy OK DATA SENT
Format specification is not the subject of this discussion.
I hope - ignoring my poor English - all should be clear now...
Remarks:
a) subtitle reader exists (my applications SubAloud and Milena-ABC),
but is limited to some synthesizers. In offline mode (Milena-ABC creates
new audio track with spoken subtitles and precisely controlled level of
original soundtrack) only Polish language is supported. SubAloud is
practically not tested, I personally never watch movies on linux box,
but I had some positive responses from ocassional testers.
b) robot speaking with Mbrola synthesizer and simple function
converting Mbrola phonemes into instructions for servomotors also
exists.
ethanak
--
http://milena.polip.com/ - Pa pa, Ivonko!
- Speech Dispatcher roadmap discussion., (continued)
- Speech Dispatcher roadmap discussion., Tomáš Cerha, 2014/10/09
- Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/09
- Speech Dispatcher roadmap discussion., Luke Yelavich, 2014/10/09
- Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/13
- Speech Dispatcher roadmap discussion., Trevor Saunders, 2014/10/14
- Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/15
- Speech Dispatcher roadmap discussion., Trevor Saunders, 2014/10/15
- Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/16
- Speech Dispatcher roadmap discussion., Luke Yelavich, 2014/10/15
- Speech Dispatcher roadmap discussion.,
Bohdan R . Rau <=
- Speech Dispatcher roadmap discussion., Luke Yelavich, 2014/10/21
Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/12
Speech Dispatcher roadmap discussion., Trevor Saunders, 2014/10/09