[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
punctuation.scm
From: |
Pierre Lorenzon |
Subject: |
punctuation.scm |
Date: |
Sat, 01 Aug 2009 12:19:53 +0200 (CEST) |
Hi,
festival-freebsoft-utils version is cvs version not older than
15 days on my system.
More precisely, my question is about
`punctuation-process-words' method and more more precisely
about its behavior for other languages than English. This
method is involved in the `Token' step of the utterance
treatement. Here is what it produces for French and punctuation
mode set to all :
Before entering the method (utt.relation_tree utt 'Word)
returns :
> --
((("Bonjour" ((id "_2") (name "Bonjour"))))
(("." ((id "_3") (name ".")))))
> -- End
And after the method has been applied we have :
> --
((("Bonjour" ((id "_2") (name "Bonjour"))))
(("." ((id "_3") (name "."))))
(("." ((id "_4") (name ".")))))
> -- End
It means that the "." has been duplicate. Why ? Is the French
Tokenization not correct (I mean not compatible with
speech-dispatcher.scm ?) Here is the output of
(utt.relation_tree utt 'Token) after `Initialize' `Text' and
`Token_POS' have been applied :
> --
((("Bonjour"
((id "_1")
(name "Bonjour")
(punc ".")
(whitespace "")
(prepunctuation "")))))
> -- End
Regards,
Pierre
- punctuation.scm,
Pierre Lorenzon <=