[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
ibmtts output module and utf8
From: |
Olivier BERT |
Subject: |
ibmtts output module and utf8 |
Date: |
Tue, 31 Jul 2007 21:02:29 +0200 |
On Sun, Jul 29, 2007 at 10:44:17AM +0200, Lukas Loehrer wrote:
> Hynek Hanke writes ("Re: ibmtts output module and utf8"):
> > I have taken out SSML because some people from Capital Accessibility
> > complained that their version doesn't support SSML and they hear the
>
> Unfortunatley, this completely killed index mark reporting in the
> ibmtts modul. Maybe, we could make the SSML stripping a configuration
> option.
>
> >But I know the former developer of the output module, Gary Cramblitt, was
> >using it with
> > SSML. I can't find the documentation and I am pretty confused now :)
>
> The only comprehensive documentation for ibmtts I know of is:
>
> http://www.wizzardsoftware.com/docs/tts.pdf
>
> However, this file does not mention SSML support at all. One concrete problem
> is that it is unclear how
> SSML index marks, which are strings, are translated into ibmtts index
> marks, which are integers, or even if SSML index marks are supported
> at all. There seems to be some kind of support for SSML marks. For
> example, the input string:
>
> "<speak>Hello <mark name="foobar"/> World.</speak>"
>
> is translated by the filter into:
>
> 'Hello `ui"foobar" World.'
>
> Notice the annotaition `ui, which is unfortunately not documented in
> the docs mentioned above. The ibmtts output module works around this
> problem by searching the input for <mark/> taks and replaces them with
> calls to eciInsertIndex().
>
> > Lukas, please, where can I get this eci.ini? The .deb package I have
> > for testing doesn't contain this file.
>
> The eci.ini file contains information about the installed languages,
> input filters and other things I do not understand. It should be
> generated in the postinstall script in
> /var/opt/IBM/ibmtts/cfg/eci.ini. If not, maybe, your .deb was created
> with alien from an .rpm without the -c option. I can sen you
> information how to generate the file manually.
>
> As for SSML support: It seems to be provided by the file:
>
> /opt/IBM/ibmtts/lib/ssmlfilter.so
>
> I have the following at the beginning of my eci.ini file:
>
> [LanguageIndependent]
> autoLoadFilter=
> Desc_Filter1=IBM SSML Filter
> Path_Filter1=/opt/IBM/ibmtts/lib/ssmlfilter.so
>
> This was generated in the postinstall script of the main ibmtts package
> like this:
>
> /opt/IBM/ibmtts/bin/inifilter /filter:1
> /path:/opt/IBM/ibmtts/lib/ssmlfilter.so /lang:all
> /ECIINI:/var/opt/IBM/ibmtts/cfg/ /name:"IBM SSML Filter" /autoload:n
>
> The weird thing with the above is that the SSML filter seems to get
> activated as as soon as an input string to eciAddText() starts with
> <speak>, even though autoloading of the SSML filter is disabled by the
> above settings. Also, as soon as the SSML filter is activated, the
> expected input encoding apparently chages from cp1252 to utf-8.
> Moreover, once the filter is active, it remains active.
Hi,
It seems that my IBMTTS version supports SSML because I've never heard SSML
tags to be spoken. In addition, I have an eci.ini file that mention the
ssmlfilter.so library and this library is present in /opt/IBM/ibmtts/lib.
So if some old version of IBM uiavoice dont't support SSML, I think it's a good
idea to add SSML striping as an option to the ibmtts speechd output module.
However, I think that we haven't resolved the UTF-8 issue for all cases.
Here's my situation :
- When ibmtts module doesn't strip SSML, UTF-8 chars are spoken
correctly.
- But when the ibmtts module strips SSML (option ibmttsUseSSML set to 0),
UTF-8 chars are spoken as if it were Latin1 chars.
The problem is that we haven't any documentation for the charset that must
be use as IBM Viavoice input.
What I can deduce from my experience is that new version of IBMTTS that
support SSML also support UTF-8 (of course if there is SSML tags in the
input string).
But I think that, if SSML stripping is enabled, the module has to convert
messages to Latin1 if they contain UTF-8 characters because I'm nearly sure
that old IBMTTS versions do not support UTF-8, and my experience shows that
if IBMTTS doesn't detect SSML commands, the input is assumed to be Latin1
encoded.
It's a strange behaviour from IBMTTS, but we have to take it into account.
What do you think about that ?
Regards,
--
Olivier BERT
T?l: (+33)(0)6 07 69 79 71
ibmtts output module and utf8, Lukas Loehrer, 2007/07/23