[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Supporting eSpeak variants properly, or, how to offer synthesizer specif
From: |
Luke Yelavich |
Subject: |
Supporting eSpeak variants properly, or, how to offer synthesizer specific settings to clients. |
Date: |
Mon, 15 Jun 2015 09:59:14 +1000 |
Hey folks.
Of late we have been adding support to Speech Dispatcher to allow users to work
with Extra espeak functionality in some way. Most recently a configuration
option was added to the espeak driver to present the available voice variants
along with the available voices. However, this is suboptimal. Last year,
support was also added to Speech Dispatcher to support eSpeak's pitch range
functionality. This has not yet been released in a tarball, and is only in git
master, as it hasn't received sufficient testing, at least from me. Additions
were made to the API to support espeak's pitch range functionality. Again, I
think this is suboptimal.
Its likely that the various synthesizers Speech Dispatcher supports all offer
some extra functionality, and it is also likely that sed functionality differs
from synthesizer to synthesizer. It would be nice to offer all of this extra
functionality to users, but I would rather not add additional API functionality
to support a single synthesizer's feature, i.e espeak's pitch range and voice
variants. What we could do however, is add an API that would retrieve, get, and
set synthesizer specific functionality.
Here is a list of some early ideas as to what would be supported.
* Every setting must have a get, set is optional.
* Need to support int value range get/set, string value get/set, string list
get.
Each synthesizer specific setting could be represented as a data structure in C
along the lines of the following:
typedef struct {
char *name;
char *description; /* This should be localized */
enum SynthSettingValueType get_type;
enum SynthSettingValueType set_type;
int min_value;
int max_value;
char **value_list;
void *cur_value;
] SynthSetting;
In the C API, a NULL terminated array of this structure would be returned for
all settings a synth offers.
The SynthSettingValueType enum would look something like this:
typedef enum {
SYNTH_SETTING_VALUE_UNKNOWN = 0,
SYNTH_SETTING_VALUE_NUMBER = 1,
SYNTH_SETTING_VALUE_STRING = 2,
SYNTH_SETTING_VALUE_STRING_LIST = 3 /* A list of strings for the user
to choose from, i.e voice variants */
} SynthSettingValueType;
I don't see why we would need to support anything more than ints, as even now
we are only dealing with ints for numerical values.
C API methods to work with these data types could be as follows:
SynthSetting **spd_synth_get_settings(SPDConnection *connection);
int spd_synth_set_setting(SPDConnection *connection, SynthSetting *setting,
void *value);
void free_synth_settings(SynthSettings **settings);
I haven't yet given any thought to either the SSIP protocol, or the protocol
between the server and drivers, but that should be trivial.
With the above, we could then allow synthesizers to provide as much specific
functionality as is desirable. We cannot expect clients to be able to locally
store these settings in their own config, so it would be up to Speech
Dispatcher to do that. Fortunately, I think GSettings provides sufficient
functionality to help with that task.
I'd be interested in any thoughts, suggestions, or questions anyone has. There
is still a bit to get done before I get to implementing this functionality, and
there are still pieces that likely need further fleshing out, like the SSIP
protocol.
Luke
- Supporting eSpeak variants properly, or, how to offer synthesizer specific settings to clients.,
Luke Yelavich <=