[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Speech Dispatcher roadmap discussion.
From: |
Bohdan R . Rau |
Subject: |
Speech Dispatcher roadmap discussion. |
Date: |
Thu, 09 Oct 2014 13:50:35 +0200 |
W dniu 2014-10-08 09:32, Luke Yelavich napisa?(a):
> Hey folks.
> This has been a long time coming.
Better late than never :)
>
> * Assess whether the SSIP protocol needs to be extended to better
> support available synthesizer features
Yes!
Some years ago I proposed CAPABILITY command...
>
> Two questions that often get asked in the wider community are:
> 1. Can I get Speech Dispatcher to write audio to a wav file?
Let's assume - there are three possibilities:
a) Module can speak. Probably all modules can speak (excluding dummy
module, which should be removed and replaced by internal server
functionalities).
b) Module can write wave to file. For example: hardware synthesizers
can not.
c) Module can return synthsized wave to server without writing to file.
As previous, hardware synthesizers can not.
So there are three possible answers (minimum one) - for example:
SPEAK
FILE
FETCH
Analogically, server should return it's capabilities for current module
- but server should do more.
For example: if module can only speak, there is no place for dance :(
if module can FILE or FETCH but not SPEAK, server is still capable to
speak (for example fetching wave or reading waveform to from file and
play it using internal methods).
Commands like 'speak', 'char' and 'key' may be prefixed with:
a) FETCH - it means, we want to fetch waveform from communication
socket.
b) FILE <filename> - we want to save waveform to file.
Also, module should return possible modifications, like:
RATE
PITCH
VOLUME
> 2. How can I use eSpeak's extra voices for various languages?
SET SYNTHESIS_VOICE command should understand variants. There is no
need to extend SSIP, for example simply 'name' may be in three forms:
voice_name - set voice and default variant
voice_name:variant - set voice and variant
:variant - switch variant of current voice
Another solution: use predefined voice names in module. I used this
solution in one of my experimental (and dead) modules (txt2pho + mbrola)
for German mbrola voices.
> * SystemD/LoginD integration
Is it problem of speech-dispatcher or pulseaudio?
> * Rework of the settings mechanism to use DConf/GSettings
As I agree: current settings mechanism should go to museum as fast as
possible - but DConf and GSettings are worst candidates. Configuration
file should be as simple as possible - in practice we need nothing more
but hash array of strings. Hash tables are faster than GSettings...
I use similar solution in my experimental (but daily used by several
persons, both completely blind and little visual impaired):
http://tts.polip.com/files/sd_milvona-0.1.9.tar.gz
>
> * Separate compilation and distribution of modules
>
> As much as many of us prefer open source synthesizers, there are
> instances where users would prefer to use proprietary synthesizers.
> We
> cannot always hope to be able to provide a driver for all
> synthesizers, so Speech Dispatcher needs an interface to allow
> synthesizer driver developers to write support for Speech Dispatcher,
> and build it, outside the Speech Dispatcher source tree.
Yes, yes, yes!
Look above :)
Milena does not use proprietary software (excluding Mbrola), but is
specialized for single (not very popular) language, and depends on
open-source, but extensively developed llibraries (milena, ivolektor
etc) which should not be shipped together with speech-dispatcher
(sometimes I published several versions of data files during one month).
I can imagine similar modules specialized for languages like Mongolian,
Nynorsk or even Quenya and Klingon... but a these modules are
interesting only for small group of users, it's no sense to put them
into main speech-dipatcher distribution :)
As I spent some time developing independent modules, for me there
should be something like:
a) something like libspeechdmodule - C library containing all needed
functions and skeleton of module.
b) working solution for other languages (like Python). I tried to write
skeleton for Python, but I'm not very happy with the results...
> * Consider refactoring client API code such that we only have one
> client API codebase to maintain, i.e python bindings wrapping the C
> library etc
For Python (cython):
As low-level Python binding should provide only direct interface to
libspeechd, it's simple and - after created - does not need maintenance
until C API will change. In fact, there is task for one person for two
days (counting morning cafe and visit in pub). If needed, I can provide
first version of Python extensions during weekend.
In fact, I had big problem with my simple application for Ubuntu and
speech-dispatcher. I wrote my app in Python 2.7, and as there is only
Python3 interface in Ubuntu... you can imagine results. My first idea
was "write Python binding to libspeechd", but I decided to rewrite this
app in C :)
GObject Introspection is nice idea, but I cannot imagine this solution
with current version of speech-dispatcher library...
Suggested "ctype" solution is worst - ctype is good for simple
functions, but not for something more sophisticated - like
get_synthesis_voices().
As I use only Python and C in my applications, I won't say anything
about other languages.
> * Moving audio drivers from the modules to the server
Little upgrade:
Allow module to use server audio output.
All your long story of audio problems affects only pulseaudio. For
other audio systems there are different problems (for example - not
working Alsa when loaded from dynamically linked library - is this bug
corrected in Alsa?).
I assume the server audio system will be possible to change rate/pitch
of synthesized wave (with sonic)...
I also have another suggestions, but it's topic for next mail :)
ethanak
--
http://milena.polip.com/ - Pa pa, Ivonko!
- Speech Dispatcher roadmap discussion., Luke Yelavich, 2014/10/08
- Speech Dispatcher roadmap discussion., Tomáš Cerha, 2014/10/09
- Speech Dispatcher roadmap discussion.,
Bohdan R . Rau <=
- Speech Dispatcher roadmap discussion., Luke Yelavich, 2014/10/09
- Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/13
- Speech Dispatcher roadmap discussion., Trevor Saunders, 2014/10/14
- Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/15
- Speech Dispatcher roadmap discussion., Trevor Saunders, 2014/10/15
- Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/16
- Speech Dispatcher roadmap discussion., Luke Yelavich, 2014/10/15
- Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/16
- Speech Dispatcher roadmap discussion., Luke Yelavich, 2014/10/21
Speech Dispatcher roadmap discussion., Bohdan R . Rau, 2014/10/12