[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
punctuation.scm
From: |
Pierre Lorenzon |
Subject: |
punctuation.scm |
Date: |
Tue, 11 Aug 2009 17:14:21 +0200 (CEST) |
Hi Milan,
I have tested it but indeed it doesn't look to me flexible
enough.
Their are three wrapping in punctuation.scm :
- token_to_words
- Token_Method
- Word_Method
I think that at least first two of them should offer the
possibility to be completely reinplemented in a language
dependent way. In particular in french, it would be very
convenient to be able to handle "'" (apostrophe)
particularly. It looks complicated to me in current code state
because of this `punctuation-split-token' method that acts
before everything and might delete these "'" that I'd like to
keep. Therefor I suggest to patch current version 1.13 of
punctuation.scm as follows :
;; -- Diff Tue Aug 11 17:00:27 2009
diff -c /home/devel/share/festival/lib/freebsoft/punctuation.scm.\~1.13.\~
/home/devel/share/festival/lib/freebsoft/punctuation.scm
*** /home/devel/share/festival/lib/freebsoft/punctuation.scm.~1.13.~ Tue Aug
11 16:02:27 2009
--- /home/devel/share/festival/lib/freebsoft/punctuation.scm Tue Aug 11
16:58:17 2009
***************
*** 31,37 ****
(defvar punctuation-chars-2 "[][]")
(defvar punctuation-punc-languages '(english britishenglish americanenglish))
! (defvar punctuation-punc-language-handlers '((french .
franfest_token_punctuation_all)))
;; Default English voice doesn't have defined pronunciation of punctuation
;; characters
--- 31,38 ----
(defvar punctuation-chars-2 "[][]")
(defvar punctuation-punc-languages '(english britishenglish americanenglish))
! (defvar punctuation-punc-language-handlers '((french .
franfest_token_punctuation)))
! (defvar punctuation-punc-language-splitters '((french .
franfest_token_punctuation_split)))
;; Default English voice doesn't have defined pronunciation of punctuation
;; characters
***************
*** 89,102 ****
(define-wrapper (token_to_words token name) punctuation
(if (eq? punctuation-mode 'default)
((next-func) token name)
! (punctuation-split-token token name (next-func))))
(define (punctuation-process-words utt)
(cond
((eq? punctuation-mode 'all)
(cond
! ((assoc (intern (Param.get 'Language))
punctuation-punc-language-handlers)
! (apply (cdr (assoc (intern (Param.get 'Language))
punctuation-punc-language-handlers)) (list utt)))
((member (intern (Param.get 'Language)) punctuation-punc-languages)
;; Standard English lexicon has no notion of punctuation pronounciation
(do-relation-items (w utt Word)
--- 90,108 ----
(define-wrapper (token_to_words token name) punctuation
(if (eq? punctuation-mode 'default)
((next-func) token name)
! (if (assoc (intern (Param.get 'Language))
! punctuation-punc-language-splitters)
! (apply (cdr (assoc (intern (Param.get 'Language))
! punctuation-punc-language-splitters))
! (list token name (next-func)))
! (punctuation-split-token token name (next-func)))))
(define (punctuation-process-words utt)
(cond
((eq? punctuation-mode 'all)
(cond
! ;; ((assoc (intern (Param.get 'Language))
punctuation-punc-language-handlers)
! ;; (apply (cdr (assoc (intern (Param.get 'Language))
punctuation-punc-language-handlers)) (list utt)))
((member (intern (Param.get 'Language)) punctuation-punc-languages)
;; Standard English lexicon has no notion of punctuation pronounciation
(do-relation-items (w utt Word)
***************
*** 144,150 ****
(Param.wrap Token_Method punctuation
(lambda (utt)
(apply* (next-value) (list utt))
! (punctuation-process-words utt)))
(Param.wrap Word_Method punctuation
;; This is here to avoid deletion of punctuation in standard functions
--- 150,161 ----
(Param.wrap Token_Method punctuation
(lambda (utt)
(apply* (next-value) (list utt))
! (if (assoc (intern (Param.get 'Language))
! punctuation-punc-language-handlers)
! (apply (cdr (assoc (intern (Param.get 'Language))
! punctuation-punc-language-handlers))
! (list utt))
! (punctuation-process-words utt))))
(Param.wrap Word_Method punctuation
;; This is here to avoid deletion of punctuation in standard functions
Diff finished. Tue Aug 11 16:59:30 2009
;; -- End Diff Tue Aug 11 17:00:27 2009
Regards
Pierre
From: Milan Zamazal <address@hidden>
To: address@hidden
Subject: Re: punctuation.scm
Date: Mon, 03 Aug 2009 14:31:15 +0200
>>>>>> "PL" == Pierre Lorenzon <devel at pollock-nageoire.net> writes:
>
> PL> Indeed it might help since in this case the duplication no
> PL> longer occurs ! But if we do not modify the code, punctuations
> PL> will have their english pronounciation even in
> PL> french.
>
> I see, thanks for testing.
>
> PL> We might say that english protocol is "the right one" and simply
> PL> follow it.
>
> This is not always possible because different languages may perform text
> processing in ways very different from English.
>
> PL> My opinion is that according to the french language
> PL> implementation in festival the best thing to do for french and
> PL> punctuation mode set to all is to do NOTHING ! Hence I made
> PL> following modification in punctuation.scm calling this
> PL> franfest_token_punctuation_all method that does nothing for the
> PL> moement. This method should be defined if french is selected as
> PL> current language.
>
> PL> The advantage is that if you accept this modification to
> PL> punctuation.scm code, further modificaitions if necessary will
> PL> be able to be done in this method without modifiying
> PL> punctuation.scm again.
>
> OK, I applied the idea, in a slightly different way by using another
> customization variable (completely untested). Please test and fix. :-)
>