speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ibmtts output module and utf8


From: Olivier BERT
Subject: ibmtts output module and utf8
Date: Tue, 31 Jul 2007 21:02:29 +0200

On Sun, Jul 29, 2007 at 10:44:17AM +0200, Lukas Loehrer wrote:
> Hynek Hanke writes ("Re: ibmtts output module and utf8"):
> > I have taken out SSML because some people from Capital Accessibility
> > complained that their version doesn't support SSML and they hear the
> 
> Unfortunatley, this completely killed index mark reporting in the
> ibmtts modul. Maybe, we could make the SSML stripping a configuration
> option.
> 
> >But I know the former developer of the output module, Gary Cramblitt, was 
> >using it with
> > SSML. I can't find the documentation and I am pretty confused now :)
> 
> The only comprehensive documentation for ibmtts I know of is:
> 
> http://www.wizzardsoftware.com/docs/tts.pdf
> 
> However, this file does not mention SSML support at all. One concrete problem 
> is that it is unclear how
> SSML index marks, which are strings, are translated into ibmtts index
> marks, which are integers, or even if SSML index marks are supported
> at all. There seems to be some kind of support for SSML marks. For
> example, the input string:
> 
> "<speak>Hello <mark name="foobar"/> World.</speak>"
> 
> is translated by the filter into:
> 
> 'Hello  `ui"foobar"  World.'
> 
> Notice the annotaition `ui, which is unfortunately not documented in
> the docs mentioned above. The ibmtts output module works around this
> problem by searching the input for <mark/> taks and replaces them with
> calls to eciInsertIndex().
> 
> > Lukas, please, where can I get this eci.ini? The .deb package I have
> > for testing doesn't contain this file.
> 
> The eci.ini file contains information about the installed languages,
> input filters and other things I do not understand. It should be
> generated in the postinstall script in
> /var/opt/IBM/ibmtts/cfg/eci.ini. If not, maybe, your .deb was created
> with alien from an .rpm without the -c option. I can sen you
> information how to generate the file manually.
> 
> As for SSML support: It seems to be provided by the file:
> 
> /opt/IBM/ibmtts/lib/ssmlfilter.so
> 
> I have the following at the beginning of my eci.ini file:
> 
> [LanguageIndependent]
> autoLoadFilter=
> Desc_Filter1=IBM SSML Filter
> Path_Filter1=/opt/IBM/ibmtts/lib/ssmlfilter.so
> 
> This was generated in the postinstall script of the main ibmtts package
> like this:
> 
> /opt/IBM/ibmtts/bin/inifilter /filter:1 
> /path:/opt/IBM/ibmtts/lib/ssmlfilter.so /lang:all 
> /ECIINI:/var/opt/IBM/ibmtts/cfg/ /name:"IBM SSML Filter" /autoload:n
> 
> The weird thing with the above is that the SSML filter seems to get
> activated as as soon as an input string to eciAddText() starts with
> <speak>, even though autoloading of the SSML filter is disabled by the
> above settings. Also, as soon as the SSML filter is activated, the
> expected input encoding apparently chages from cp1252 to utf-8.
> Moreover, once the filter is active, it remains active.

Hi, 

It seems that my IBMTTS version supports SSML because I've never heard SSML
tags to be spoken. In addition, I have an eci.ini file that mention the 
ssmlfilter.so library and this library is present in /opt/IBM/ibmtts/lib. 
So if some old version of IBM uiavoice dont't support SSML, I think it's a good
idea to add SSML striping as an option to the ibmtts speechd output module. 
However, I think that we haven't resolved the UTF-8 issue for all cases. 
Here's my situation : 
- When ibmtts module doesn't strip SSML, UTF-8 chars are spoken 
  correctly. 
- But when the ibmtts module strips SSML (option ibmttsUseSSML set to 0), 
  UTF-8 chars are spoken as if it were Latin1 chars. 

The problem is that we haven't any documentation for the charset that must
be use as IBM Viavoice input. 
What I can deduce from my experience is that new version of IBMTTS that 
support SSML also support UTF-8 (of course if there is SSML tags in the 
input string). 
But I think that, if SSML stripping is enabled, the module has to convert 
messages to Latin1 if they contain UTF-8 characters because I'm nearly sure
that old IBMTTS versions do not support UTF-8, and my experience shows that
if IBMTTS doesn't detect SSML commands, the input is assumed to be Latin1 
encoded. 
It's a strange behaviour from IBMTTS, but we have to take it into account. 

What do you think about that ?

Regards,
-- 
Olivier BERT
T?l: (+33)(0)6 07 69 79 71


reply via email to

[Prev in Thread] Current Thread [Next in Thread]