[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Speakup works with Czech language
From: |
Kirk Reiser |
Subject: |
Speakup works with Czech language |
Date: |
Mon Sep 4 09:59:45 2006 |
Hello Hynek and all: If I have missed something please let me know. I
have tried to answer your questions but how clearly those explanations
will come accross is difficult to tell.
I'll describe now what's necessary to make it work
with Czech language in more detail:
1) speakup.c must be patched on one place. The spk_chartab
table is specific to iso-8859-1 and it doesn't work with
other codings (~l. 206, speakup.c), because in the extended character
sets, some symbols over 128 are also ALPHA or A_CAP, not just
B_SYM. If this is not changed, Speakup writes strange codes
of these keys into /dev/softsynth (not the codes defined
in iso-8859-2 for Czech, for example).
Since it's not desirable to change Speakup's code for every
internationalization and change it every time we want to use a
different language, I'd like to ask if it wouldn't be possible to make
this table accessible from /proc/speakup? So that users are able to
change it like they are able to change /proc/speakup/characters.
Then for example speechd-up would be able to set this table
automatically, since there are routines in libc for telling
if a given character, according to the given locales, is
alpha, number or a capital alpha.
I agree we need to figure out a way to conver the non-iso 1859-1
characters to get appropriate representations for various languages.
The tables however which the kernel uses to represent those character
sets are not part of speakup so I don't think we should make them
available through the /proc/speakup interface. I am not exactly sure
but it seems to me if the correct language tables are selected for the
kernel and an appropriate translation were loaded into
/proc/speakup/characters correct definitions would be sent to the
synth soft or otherwhise.
2) As for character reading and punctuation, /dec/proc/characters
needs to be changed, of course. But we think it would be desirable
if we could turn out these things in Speakup totally, so that
we could let the software TTSs handle this task. There are good
capabilities for processing punctuation and single characters in
Festival, it's configurable for every language etc. So it would be best
if Speakup just passes the "punctuation mode" DoubleTalk commands
like it does now and don't process the text itself alone. If only
one letter is passed, it's quite straightforward that it should be
read as a single character. Again, we would welcome no processing
on the Speakup's side.
I understand your point here and that's how we used to do it. We
changed to using our own translation system because trying to support
all the various synths we support meant for some synths we didn't have
full representation. I implemented the /proc/speakup/character system
so we could have a totally homogenious interface accross all synths
and provide a mechanism so individuals could change those to suit
their own needs. You see that of course because festival and flite
use one way of pronouncing punctuation and the like while the Dectalk
synth, tuxtalk and viavoice are different. The system is a simple
character to string conversion so you can change to whatever you like
for different synths.
3) Only some kinds of console screen fonts can be used for the
Czech language. I don't really understand why, but it doesn't
work with some console fonts, although all of them use iso-8859-2.
If I use a different font than iso02.fxx.psf, the codes of the
alphanumeric characters above 128 returned by Speakup don't correspond
with iso-8859-2. It doesn't work with e.g. lat2u-xx.psf or
lat2-sunxx.psf although these fonts are also iso-8859-2. I don't
know what's the difference in them from the Speakup's perspective.
There is no difference from a speakup perspective. You have to keep
straight the difference between the language set tables and the font
tables. The language set tables are single or double byte
representations which get stored in video memory. One byte per
character. The font tables are used to take those representations and
turn them into bitmap image representations for the various video card
character generators. So speakup reads the video memory to provide
review functions and knows nothing about fonts or the like.
So I have a question: Where does Speakup actually read the input
text from? Don't you know, how different fonts (although all working
in the same character set) could influence this?
I believe I just answered this above if it still is not clear let me
know and I'll try again.
If the above points are fulfilled, it works with Czech language.
Something more might be needed for some other language, but I think
this is sufficient for a large group of languages. We tested it
with Speakup (cvs) loaded as a module, Speechd-up (cvs) interface daemon,
Speech Dispatcher (cvs) and Festival 1.4.3.
I will check that out of cvs and play with it as soon as I finish the
project I am currently working on.
I'd like to ask some more questions:
1) How does CAPS_START and CAPS_STOP work and where and why is it used?
It's not present in the documentation of the DoubleTalk protocol and I
somehow don't understand it.
Each synth has it's own command set definitions and capabilities as
you know. Cap_start and cap_stop are speakups way of handling each
synths nuances they are simple strings which get sent to the synth at
the beginning and end of an uppercase letter or string of letters.
That string can contain the control sequences for the synth or a text
string like 'cap ' to be sent to the synth when any capital letter is
encountered. These command set controls strings are contained in an
array of pointers built in each synth driver to modify the output to
be customizable for different synths. They are built at the bottom of
each driver and a pointer to them is passed upon registration of the
synth with the kernel. In the case of the Doubletalk the strings by
default are ctrl-a+35p and ctrl-a-35p respectively to adjust the pitch
up and then back down a noticible amount so when reviewing the screen
a person knows immediately whether the character they are on is
uppercase or not.
2) Wouldn't it be possible to pass also some command for setting
language through /dev/softsynth? My idea is that the user could
switch between two or three languages just by pressing a Speakup
key shortcut. Then Speakup would send a command like ^XL2 or something
like that to /dev/softsynth, Speechd-Up would look for what it's
configured to do with language number 2, it would set the /proc/speakup/
tables accordingly, it would switch it's character set and synthesis
language and the user could continue using this language. This way,
software synthesis users would be able to easily switch languages.
You could certainly do that. In fact synths like the Apollo that
support multiple languages have such a sequence. In the doubletalk
command set there is a command to change voices ctrl-anv to move
between various voices. 'N' in this case is a number between 0-9
which could be used as an index for different voices. There is a
/proc/speakup/voice entry created for synths that support multiple
voices to give a simple way of changing between the voices. You
certainly could set up keyboard macros to change languages/voices on
the fly. One might extend the idea to desire changes in languages
automagically when coming upon an umlaut or loading different
applications. Those features come under the heading of automatic
configuration sets which I have not implemented yet but it is on my
agenda at some point.
3) Speakup works fine!
Happy to hear that. I can hardly wait to try it!!!!.
Kirk
--
Kirk Reiser The Computer Braille Facility
e-mail: address@hidden University of Western Ontario
phone: (519) 661-3061