[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Chinese with IBM TTS driver
From: |
Gary Cramblitt |
Subject: |
Chinese with IBM TTS driver |
Date: |
Thu, 15 Mar 2007 17:10:36 -0500 |
On Thursday 15 March 2007 06:10, Tomas Cerha wrote:
> Gary Cramblitt wrote:
> > I guess the first step would be to examine the logs to
> > see if speech-dispatcher is receiving the text OK, then see if the ibmtts
> > module is receiving the text OK. If both are true, then the bug would be
> > in the way the ibmtts module sends it to the ibm eloquence engine.
>
> I'm attaching the log file with debug level 5 which I got from the
> Chinese user.
>
> Best regards, Tomas.
Thanks Tomas.
It appears that the text is being mangled by Speech Dispatcher before it gets
to ibmtts module. Here is hex dump of portion of the log:
000010f0 53 47 20 62 65 66 6f 72 65 20 69 6e 64 65 78 20 |SG before index |
00001100 6d 61 72 6b 69 6e 67 3a 20 7c e9 ab 3f e7 3f 3f |marking: |..?.??|
00001110 e6 3f ba 7c 2c 20 73 73 6d 6c 5f 6d 6f 64 65 3d |.?.|, ssml_mode=|
00001120 30 0a 5b 54 68 75 20 4d 61 72 20 31 35 20 31 37 |0.[Thu Mar 15 17|
00001130 3a 30 32 3a 33 30 20 32 30 30 37 20 3a 20 37 33 |:02:30 2007 : 73|
00001140 39 38 39 36 5d 20 73 70 65 65 63 68 64 3a 20 20 |9896] speechd: |
00001150 20 20 20 4d 53 47 20 61 66 74 65 72 20 69 6e 64 | MSG after ind|
00001160 65 78 20 6d 61 72 6b 69 6e 67 3a 20 7c 3c 73 70 |ex marking: |<sp|
00001170 65 61 6b 3e ff bf bf bf bf bf 3f ff bf bf bf bf |eak>......?.....|
00001180 bf 3f 3f ff bf bf bf bf bf 3f 3c 2f 73 70 65 61 |.??......?</spea|
00001190 6b 3e 7c 0a 5b 54 68 75 20 4d 61 72 20 31 35 20 |k>|.[Thu Mar 15 |
Notice how the buffer's bytes change between the "before index marking" and
the "after index marking" (ignoring the speak tags).
It is probable that even if speechd did not mangle the buffer, the ibmtts
module would do mangling of its own. :/ To check, would need log from
ibmtts module in addition to speechd.log.
I'm not sure what the explanation for the mangling is. We do handle UTF-8
languages, such as Czech, which is ISO 8859-2. ??
--
Gary Cramblitt (aka PhantomsDad)