[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
UTF8-characters coming as question marks while using generic driver
From: |
Hynek Hanke |
Subject: |
UTF8-characters coming as question marks while using generic driver |
Date: |
Mon, 25 Aug 2008 13:31:34 +0200 |
Santhosh Thottingal wrote:
> I was trying to use the speech dispatcher generic driver for dhvani
> Indic Text to speech system(dhvani.sourceforge.net).
>
> In /etc/speech-dispatcher/modules/dhvani-generic.conf file I have
> given like this
> GenericLanguage "ml" "malayalam" "UTF-8"
>
Dear Santhosh,
this is the correct way to specify the character set with
the generic output module.
> But somehow the input reached to the text to speech system is only
> question marks for the unicode string.
>
Question mark is a fallback for characters which can't
be found in the target character set.
> Sun Aug 24 17:46:49 2008 [387408]: Warning: Prefered charset not
> specified, recoding to iso-8859-1
>
It looks like for some reason, the output module ignores
the GenericLanguage line in the configuration file. Recoding
to iso-8859-1 is clearly a nonsense and this is why you get
question marks. I tested the encoding settings in the generic
module today with SD 0.6.7 and it seems to work well.
> client.set_output_module('dhvani-generic')
> client.set_language('ml')
> client.speak("???????")
> client.close()
>
This looks correct. You can also try with
spd-say -l ml -o dhvani-generic "???????"
Could you please send the configuration file (dhvani-generic.conf)
and the version of speech-dispatcher you are using?
(speech-dispatcher -v) Perhaps there is just some typo.
Dhvani looks very interesting! It's Free Software and it
supports various languages. It is very good that you want
to integrate it with screen readers and I think Speech Dispatcher
is a very good way.
I think for the future, it would be useful to develop a native
module for Speech Dispatcher because generic is limited
in many ways. We will be happy to offer all assistance.
With regards,
Hynek Hanke