[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
punctuation.scm
From: |
Pierre Lorenzon |
Subject: |
punctuation.scm |
Date: |
Mon, 03 Aug 2009 11:41:13 +0200 (CEST) |
Hi Milan,
Thanks for your answer !
From: Milan Zamazal <address@hidden>
To: address@hidden
Subject: Re: punctuation.scm
Date: Mon, 03 Aug 2009 10:22:25 +0200
>>>>>> "PL" == Pierre Lorenzon <devel at pollock-nageoire.net> writes:
>
> PL> Before entering the method (utt.relation_tree utt 'Word)
> PL> returns :
>
> >> --
>
> PL> ((("Bonjour" ((id "_2") (name "Bonjour"))))
> PL> (("." ((id "_3") (name ".")))))
>
> >> -- End
>
> PL> And after the method has been applied we have :
>
> >> --
>
> PL> ((("Bonjour" ((id "_2") (name "Bonjour"))))
> PL> (("." ((id "_3") (name "."))))
> PL> (("." ((id "_4") (name ".")))))
>
> >> -- End
>
> PL> It means that the "." has been duplicate. Why ? Is the French
> PL> Tokenization not correct (I mean not compatible with
> PL> speech-dispatcher.scm ?)
>
> Hi Pierre,
>
> try to add French language identifier(s) as returned by the call
>
> (Param.get 'Language)
>
> to punctuation-punc-languages variable defined at the beginning of
> punctuation.scm file and tell me whether it helps.
Indeed it might help since in this case the duplication no
longer occurs ! But if we do not modify the code, punctuations
will have their english pronounciation even in
french. Personally I would say that I "don't care" but I know
French users who won't share this point of view !
>
> One of the practical problems with Festival is that there is no fixed
> text processing schema, so it's impossible to handle all possible
> situations in festival-freebsoft-utils nor it is easy to define what's
> correct or compatible. We have to resolve each problem
> individually.
We might say that english protocol is "the right one" and
simply follow it. But their are other language
implementations over which we do not have any control ! So
let's treat the problem individually as you say :
My opinion is that according to the french language
implementation in festival the best thing to do for french
and punctuation mode set to all is to do NOTHING ! Hence I
made following modification in punctuation.scm calling this
franfest_token_punctuation_all method that does nothing for
the moement. This method should be defined if french is
selected as current language.
The advantage is that if you accept this modification to
punctuation.scm code, further modificaitions if necessary
will be able to be done in this method without modifiying
punctuation.scm again.
Problem is that if there exists some day another "festival
frenchification" it will have to implement this
method. Anyway for the moement there is no other french
module for festival.
;; -- Diff Mon Aug 3 11:13:44 2009
diff -c /home/devel/share/festival/lib/freebsoft/punctuation.scm.\~1.12.\~
/home/devel/share/festival/lib/freebsoft/punctuation.scm
*** /home/devel/share/festival/lib/freebsoft/punctuation.scm.~1.12.~ Mon Jul
28 18:33:27 2008
--- /home/devel/share/festival/lib/freebsoft/punctuation.scm Mon Aug 3
11:06:06 2009
***************
*** 93,126 ****
(define (punctuation-process-words utt)
(cond
((eq? punctuation-mode 'all)
! (if (member (intern (Param.get 'Language)) punctuation-punc-languages)
! ;; Standard English lexicon has no notion of punctuation
pronounciation
! (do-relation-items (w utt Word)
! (let ((trans (assoc (item.name w) punctuation-pronunciation)))
! (if (and trans
! (not (word-mapping-of w)))
! (begin
! (item.set_name w (car (cdr trans)))
! (set! trans (cdr (cdr trans)))
! (while trans
! (let ((i (item.insert w (list (car trans)))))
! (item.append_daughter (item.parent (item.relation w
'Token))
! i))
! (set! trans (cdr trans)))))))
! ;; We assume other languages don't insert punctuation words themselves
! (do-relation-items (w utt Word)
! (let* ((w* (item.relation w 'Token))
! (token (item.parent w*)))
! (when (and (not (item.prev w*))
! (item.has_feat token 'prepunctuation))
! (dolist (p (reverse (symbolexplode (item.feat token
'prepunctuation))))
! (let ((i (item.insert w `(,p ((name ,p))) 'before)))
! (item.prepend_daughter token i))))
! (when (and (not (item.next w*))
! (item.has_feat token 'punc))
! (dolist (p (reverse (symbolexplode (item.feat token 'punc))))
! (let ((i (item.insert w `(,p ((name ,p))))))
! (item.append_daughter token i))))))))
;; Delete punctuation when punctuation-mode is none
;; (We actually don't delete the words as this might discard annotations
;; such as index marks. So we just make the word names empty.)
--- 93,133 ----
(define (punctuation-process-words utt)
(cond
((eq? punctuation-mode 'all)
! (cond
! ((member (intern (Param.get 'Language))
! punctuation-punc-languages)
! ;; Standard English lexicon has no notion of punctuation pronounciation
! (do-relation-items
! (w utt Word)
! (let ((trans (assoc (item.name w) punctuation-pronunciation)))
! (if (and trans
! (not (word-mapping-of w)))
! (begin
! (item.set_name w (car (cdr trans)))
! (set! trans (cdr (cdr trans)))
! (while trans
! (let ((i (item.insert w (list (car trans)))))
! (item.append_daughter (item.parent (item.relation w
'Token))
! i))
! (set! trans (cdr trans))))))))
! ;; For French language, the simplest seems to do nothing !
! ((eq? (intern (Param.get 'Language)) 'french)
! (franfest_token_punctuation_all utt))
! ;; We assume other languages don't insert punctuation words themselves
! (t (do-relation-items
! (w utt Word)
! (let* ((w* (item.relation w 'Token))
! (token (item.parent w*)))
! (when (and (not (item.prev w*))
! (item.has_feat token 'prepunctuation))
! (dolist (p (reverse (symbolexplode (item.feat token
'prepunctuation))))
! (let ((i (item.insert w `(,p ((name ,p))) 'before)))
! (item.prepend_daughter token i))))
! (when (and (not (item.next w*))
! (item.has_feat token 'punc))
! (dolist (p (reverse (symbolexplode (item.feat token 'punc))))
! (let ((i (item.insert w `(,p ((name ,p))))))
! (item.append_daughter token i)))))))))
;; Delete punctuation when punctuation-mode is none
;; (We actually don't delete the words as this might discard annotations
;; such as index marks. So we just make the word names empty.)
Diff finished. Mon Aug 3 11:13:11 2009
;; -- End Diff Mon Aug 3 11:13:44 2009
>
> BTW, what package do you use for French in Festival?
The so called FranFest package
http://download.gna.org/lliaphon/franfest/franfest-1.96-beta-rc01.tar.bz2
http://www.pollock-nageoire.net/franfest.html
The latter is not available since the server is down
for the moment.
I was precisely doing a few updates and maintenance on
FranFest. I knew for a long time that there was a
problem with punctuations. The first one I had to solve
was due to the so called "liaisons" in French. At that
point FranFest conflicted with punctuation.scm because
of the empty word "" inserted when punctuation mode was
set to none. I solved this problem inside
FranFest. Then appeared this problem of punctuation
duplication that might be solved as described above.
If you integrate my patch I'll see the commit message
on the list and will update my cvs version.
Regards
PIerre
>