[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Speech Dispatcher roadmap discussion.
From: |
Luke Yelavich |
Subject: |
Speech Dispatcher roadmap discussion. |
Date: |
Wed, 8 Oct 2014 18:32:09 +1100 |
Hey folks.
This has been a long time coming. I originally promised a roadmap shortly after
taking up Speech Dispatcher maintainership. Unfortunately, as is often the
case, real life and other work related tasks got in the way, however I am now
able to give some attention to thinking about where to take the project from
here. It should be noted that a lot of what is here is based on roadmap
discussions back in 2010(1) and roadmap documents on the project website.(2)
Since then, much has changed in the wider *nix ecosystem, and there have been
some changes in underlying system services, and there are now additional
requirements that need to be considered.
I haven't given any thought as to version numbering at this point, I'd say all
of the below is 0.9. If we find any critical bugs that need fixing, we can
always put out another 0.8 bugfix release in the meantime.
The roadmap items, as well as my thoughts are below.
* Implement event-based main loops in the server and modules
I don't think this requires much explanation. IMO this is one of the first
things to be done, as it lays some important groundwork for other improvements
as mentioned below. Since we use Glib, my proposal is to use the Glib main loop
system. It is very flexible, and easy to work with.
* Assess whether the SSIP protocol needs to be extended to better support
available synthesizer features
Two questions that often get asked in the wider community are:
1. Can I get Speech Dispatcher to write audio to a wav file?
2. How can I use eSpeak's extra voices for various languages?
We should have a look at the SSIP protocol, as well as the features offered by
the synthesizers we support today, and determine whether we need to extend SSIP
to support everything that the synthesizers have to offer. This may require
changes or additions to the client API, particularly for the wav file audio
output that prospective clients may wish to use.
* Assess DBus use for IPC between client and server
Brailcom raised this back in 2010, and the website mentions analysis being
required, however I have no idea what they had in mind. Nevertheless, using
DBus as the client-server IPC is worth considering, particularly with regards
to application confinement, and client API, see below. Work is ongoing to put
the core part of DBus into the kernel, so once that is done, performance should
be much improved.
Its worth noting that DBus doesn't necessarily have to be used for everything.
DBus could be used only to spawn the server daemon and nothing else, or the
client API library could use DBus to initiate a connection via DBus, setting up
a unix socket per client. I haven't thought this through, so I may be missing
the mark on some of these ideas, but we should look at all options.
* SystemD/LoginD integration
In many Linux distros today, SystemD is used for system boot and service
management. Part of this is the use of LoginD for user session/login
management, which replaces ConsoleKit. The roadmap documentation on the project
website goes into some detail as to why this is required, but an email from
Hynek goes into even more detail.(3) Even though he talks about ConsoleKit, it
is the same with LoginD.
I am aware that some distros still do not use LoginD, so we may need to
implement things such that support or other systems can be used, i.e if
ConsoleKit is still being used dispite its deprecation, then we should support
it also. I don't think Gentoo uses SystemD, so if someone could enlighten me
what Gentoo uses for session management, I would appreciate it.
* Support confined application environments
Like it or not, ensuring applications have access to only what they need is
becoming more important, and even open source desktop environments are looking
into implementing confinement for applications. Unfortunately no standard
confinement framework is being used, so this will likely need to be modular to
support apparmor/whatever GNOME is using. Apparmor is what Ubuntu is using for
application confinement going forward.
* Rework of the settings mechanism to use DConf/GSettings
There was another good discussion about this back in 2010. You will find this
discussion in the same link I linked to above with regards to
Consolekit/LoginD. GSettings has seen many improvements since then, which will
help in creating some sort of configuration application/interface for users to
use to configure Speech Dispatcher, should they need to configure it at all.
Using GSettings, a user can make a settings change, and it can be acted on
immediately without a server or module restart. GSettings also solves the
system/user configuration problem, in that if the user has not changed a
setting, the system-wide setting is used as the default until the user changes
that setting. We could also extend the client API to allow clients to have more
control over Speech Dispatcher settings that affect them, and have those
settings be applied on a client by client basis. I think we already have
something like this now, but the client cannot change those settings via an API.
* Separate compilation and distribution of modules
As much as many of us prefer open source synthesizers, there are instances
where users would prefer to use proprietary synthesizers. We cannot always hope
to be able to provide a driver for all synthesizers, so Speech Dispatcher needs
an interface to allow synthesizer driver developers to write support for Speech
Dispatcher, and build it, outside the Speech Dispatcher source tree.
* Consider refactoring client API code such that we only have one client API
codebase to maintain, i.e python bindings wrapping the C library etc
This is one that was not raised previously, but it is something I have been
thinking about recently. At the moment, we have multiple implementations of the
API for different languages, python and C come to mind. There are others, but
this may not be applicable to them, i.e guile, java, etc.
I have been pondering whether it would save us work in maintenance if we only
had one client API codebase to maintain, that being the C library. There are 2
ways to provide python bindings from a C library, and there may be more. Should
we decide to go down this path, all should be considered. The two that come to
mind are outlined below. I've also included some pros and cons, but their is
likely more that I haven't thought of.
Using cython:
Pros:
* Provides both python 2 and 3 support
* Produces a compiled module that works with the version of python it was built
against, and should only require python itself as well as the Speech Dispatcher
client library at runtime
Cons:
* Requires knowledge of cython and its syntax that mixes python and C
* Requires extra code
Using GObject introspection:
Pros:
* Provides support for any language that has GObject introspection support,
which immediately broadens the API's usefulness beyond python
* Has good python 2 and 3 support
* Little to no extra code needs to be written but does require that the C
library be refactored, see below
Cons:
* Introduces more dependencies that need to be present at runtime
* Requires the C library to be refactored to be a GObject based library and
annotation is required to provide introspection support
My understanding of both options may be lacking, so I have likely missed
something, please feel free to add to the above.
* Moving audio drivers from the modules to the server
Another one that was not raised previously, but needs to be considered. I
thought about this after considering various use cases for Speech Dispatcher
and its clients, particularly Orca. This is one that is likely going to benefit
pulse users more than other audio driver users, but I am sure people can think
of other reasons.
At the moment, when using pulseaudio, Speech Dispatcher connects to pulseaudio
per synthesizer, and not per client. This means that if a user has Orca
configured to use different synthesizers for say the system and hyperlink
voices, then these synthesizers have individual connections to PulseAudio. When
viewing a list of currently connected PulseAudio clients, you see names like
sd_espeak, or sd_ibmtts, and not Orca, as you would expect. Furthermore, if you
adjust the volume of one of these pulse clients, the change will only affect
that particular speech synthesizer, and not the entire audio output of Orca.
What is more, multiple Speech Dispatcher clients may be using that same
synthesizer, so if volume is changed at the PulseAudio level, then an unknown
number of Speech Dispatcher clients using that synthesizer are affected. In
addition, if the user wishes to send Orca output to another audio device, then
they have to change the output device for multiple Pulse clients, and as a
result they may also be moving the output of another Speech Dispatcher client
to a different audio device where they don't want it.
Actually, the choice of what sound device to use per Speech Dispatcher client
can be applied to all audio output drivers. In other words, moving management
of output audio to the server would allow us to offer clients the ability to
choose the sound device that their audio is sent to.
Please feel free to respond with further discussion points about anything I
have raised here, or if you have another suggestion for roadmap inclusion, I'd
also love to hear it.
Luke
(1) http://lists.freebsoft.org/pipermail/speechd/2010q3/002360.html
(2) http://devel.freebsoft.org/speechd-roadmap
(3) http://lists.freebsoft.org/pipermail/speechd/2010q3/002406.html