[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
symbolic voice-types versus synthesis voices
From: |
Tomas Cerha |
Subject: |
symbolic voice-types versus synthesis voices |
Date: |
Mon, 08 Nov 2010 21:01:00 +0100 |
Dne 8.11.2010 13:39, Andrei Kholodnyi napsal(a):
>> But does this diversity matter? If these diverse names are exposed to the
>> end user, I
>> think it is still better than exposing nicely aligned symbolic names, which
>> carry no
>> information (except for the gender). The client can also expose voice
>> properties to the
>> user if this is implemented (and available).
>
> each synth has its own convention for the voices naming, e.g.
> espeak:
> NAME LANGUAGE VARIANT
> default en none
> en-scottish en sc
> english en uk
> lancashire en uk-north
> english_rp en uk-rp
> english_wmids en uk-wmids
> english-us en us
> en-westindies en wi
Well, Espeak is a very special beast here. It in fact has just one voice (one
set of
recorded data). The voices listed by espeak are actually different rule sets
applied to
this one basic voice. Such specifics should be handled within a particular
output
module and reported to Speech Dispatcher in a manner consistent with other
synths.
> pico:
> samantha en en-US
> serena en en-GB
Yes, this is a more typical example.
> as you can see VARIANT differs between them, e.g. you have
> english-us en us
> samantha en en-US
> which is the same variant, but written differently.
> It means if apps want to search for "US English" you don't know what
> to search for.
>
> LANGUAGE is also different, you might have e.g. 3 letters
> greek-ancient grc none
>
> Now my question is do we want to introduce a consistent voice naming
> convention for SD?
> we could leave e.g. language names as is /however there is a name
> clash probability between synths/
I don't think naming must be consistent, but voice properties must definitely be
reported consistently. When I speak about name, I mean a unique human readable
voice
identifier. It doesn't need to be unique across synthesizers as it may always
be
exposed to the user in combination with the synth name - it is quite natural.
We can't
avoid a situation that two synths provide a voice of the same name, such as
"Samantha".
To me it seems ok for the user to have choices like "Pico/Samantha",
"Pico/Serena",
"Festival/Samantha". It is IMO still better than having to select from
"Pico/female-1",
"Pico/female-2", "Festival/female-1". If someone likes the Pico's Samantha
voice, he
would suggest it to a friend by that name, rather by some normalized identifier.
> but IMO it would be good to "normalize" LANGUAGE and VARIANT at least.
> it will allow to search properly.
Sure. All voice properties must have normalized meaning and values. Output
module must
map the synth specific properties to the normalized ones. Some synths will not
support
all the properties (for example the module will not be able to determine the
age of a
particular voice) so this must be also considered.
> I just thought that we might probably map names to something like
> "spd-voice-NN" or "male-en-NN",
> which is not much worser than e.g. "english-us" :D,
I'm not sure if you mean this for some sort of internal identifiers or names
exposed to
the user.
I am a little confused if we actually agree or not here :-) But IMO we need
something
like that:
Client:
LIST VOICES
Server:
1 Samantha
2 Serena
Client:
VOICE PROPERTIES 1
Server:
LANG: en
VARIANT: US
GENDER: female
AGE: 25
So the user can see the native voice name, its properties and select based on
either the
name or the properties. Both of them may be important for the user.
Hope it is clear what I mean now.
Best regards, Tomas