bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20140: 24.4; M17n shaper output rejected


From: Richard Wordingham
Subject: bug#20140: 24.4; M17n shaper output rejected
Date: Wed, 18 Mar 2015 22:20:40 +0000

I am running Emacs 24.4 in a Ubuntu 12.04 Precise Pangolin
installation, for which the version of libm17n-0 is 1.6.3-1.  I am
attempting to induce Emacs to render the Tai Tham script.  There
appears to be a bug/feature in Emacs which makes this unnecessarily
difficult.

To achieve Tai Tham rendering, I added the following in new, loaded file
tai-tham.el:

(defvar tai-tham-composable-pattern
  (let ((table
         ;; C is letters, independent vowels, digits, punctuation and
symbols. '(("C" .
"[\u1A20-\u1A54\u1A80-\u1A89\u1A90-\u1A99\u1AA0-\u1AAD]") ("M" .
"[\u1A55-\u1A5E\u1A61-\u1A7C\u1A7F]"); Mark ("S" . "[\u1A75-\u1A7C]") ;
Marks commuting with sakot ("H" . "\u1A60") ; sakot
           ("N" . "\u1A58"))) ; mai kang lai - also included in M.
;; Which orthographic syllable mai kang lai belongs to can depend on
the font! (regexp "C\\(M\\|HS*C?\\)*\\(NC\\(M\\|HS*C?\\)*\\)*N?"))
    (let ((case-fold-search nil))
      (dolist (elt table)
        (setq regexp (replace-regexp-in-string (car elt) (cdr elt)
                                               regexp t t))))
    regexp))

(let ((elt (list (vector tai-tham-composable-pattern 0
'font-shape-gstring) (vector "." 0 'font-shape-gstring)
                 )))
  (set-char-table-range composition-function-table '(#x1A20 . #x1AAD)
  elt))

I added the following (cut-down) file LANA-OFT.flt to the m17n database:

(font layouter lana-otf nil
      (font (nil nil unicode-bmp :otf=lana)))
(category
 ;; H: SAKOT
 ;; N: Other character with non-zero canonical combining class
 ;; Z: Character with ccc=0 or other with ccc=9 
 (0x0000 0x1A5F ?Z)address@hidden
 (0x1A60        ?H)
 (0x1A61 0x1A74 ?Z)
 (0x1A75 0x1A7C ?N)
 (0x1A7D 0xFFFF ?Z)
)

(generator
  (0
    (cond
      ("(H)(N+)" (2 = *) (1 =))
      ("." =)
    ) *
  )
)

(category
 ;; C: Consonant and non-mark (lenient processing)
 ;; H: SAKOT
 ;; P: Preposed address@hidden
 ;; R: Medial RA (preposed dependent consonant)
 ;; M: Mark
 (0x1A20 0x1A54 ?C)
 (0x1A55 0x1A55 ?R)
 (0x1A56 0x1A5E ?M)
 (0x1A5F        ?C) ; Unassigned
 (0x1A60        ?H)
 (0x1A61 0x1A6D ?M)
 (0x1A6E 0x1A72 ?P)
 (0x1A73 0x1A7C ?M)
 (0x1A7D 0x1A7E ?C) ; Unassigned
 (0x1A7F        ?M)
 (0x1A80 0x1A89 ?C)
 (0x1A8A 0x1A8F ?C) ; Unassigned
 (0x1A90 0x1A99 ?C)
 (0x1A9A 0x1A9F ?C) ; Unassigned
 (0x1AA0 0x1AAC ?C) ; Punctuation
 (0x1AAD        ?C) ; Can take a vowel!
 (0x1AAE 0x1AAF ?C) ; Unassigned
)

(generator
  (0
    (cond
      ("(C)(R|P)" (2 =) (1 =) )
      ("." =)
    )*
  )
)

(generator (0 otf:lana))

However, much Tai Tham text failed to render properly.  To determine
what was wrong, I added some monitoring code to ftfont.c:

*** ftfont.c.orig       2014-03-21 05:34:40.000000000 +0000
--- ftfont.c    2015-03-18 19:47:30.032718995 +0000
***************
*** 2516,2522 ****
--- 2516,2553 ----
      flt = mflt_get (msymbol ("combining"));
    for (i = 0; i < 3; i++)
      {
+       int k;
+       fprintf(stdout, "mflt_run(");
+       if (gstring.glyphs[0].encoded) {
+       for (k = 0; k < len; k++) {
+         fprintf(stdout, " %d", gstring.glyphs[k].code);
+       }
+       } else {
+       for (k = 0; k < len; k++) {
+         fprintf(stdout, " %4.4X", gstring.glyphs[k].c);
+       }
+       }
        int result = mflt_run (&gstring, 0, len, &flt_font_ft.flt_font,
flt);
+       if (-1 == result) {
+       fprintf(stdout, ") failed.\n");
+       } else if (result >= 0) {
+       fprintf(stdout, ") produced (");
+       for (k = 0; k < result; k++) {
+ #if 0
+         fprintf(stdout, " %d", gstring.glyphs[k].code);
+ #else
+         fprintf(stdout, " %4.4X>%d:%d:%d",
+                 gstring.glyphs[k].c, gstring.glyphs[k].code,
+                 gstring.glyphs[k].from, gstring.glyphs[k].to);
+ #endif
+       }
+       fprintf(stdout, ")\n");
+       if (result != gstring.used) {
+         fprintf(stdout, "Anomalously, gstring.used = %d\n",
+                 (int) gstring.used);
+       }
+       fflush(0);
+       }
        if (result != -2)
        break;
        if (INT_MAX / 2 < gstring.allocated)

The sample Tai Tham text was:
;; ᩈᩣᩴᩁᩢ᩠ᨷᨽᩣᩈᩣᩃ᩶ᩣ᩠ᨶᨶᩣ / ᨣᩣᩴᨾᩮᩬᩥᨦ - ᩈᩢᨬ᩠ᨬᩣ ᨠ᩠᩵ᨷ ᩃ᩠᩶ᨯ ᨮ᩠ ᨳᩫ᩠᩵ᨶ
ᨠᩢ᩠᩵ᨷᨠᩫ᩠᩶ᨯᨿᩥ᩠ᨷᨶᩦ᩠᩵ᨷ
;; ᨣᩕ   ᨲᩱ

I extract and analyse what was rendered as shaped ('accepted') and what
was not ('rejected'), quoting the monitoring output.  I suspect the
problem is the strict testing of the from and to fields in Lisp function
font-shape-gstring, which is defined in file font.c.

The shaping of the following was accepted:
mflt_run( 1A48 1A63 1A74) produced ( 1A48>820:0:0 1A63>858:1:1 1A74>878:2:2)

mflt_run( 1A41 1A62 1A60 1A37) produced ( 1A41>813:0:1 1A62>853:0:1
0000>953:2:3)

mflt_run( 1A3D 1A63) produced ( 1A3D>808:0:0 1A63>858:1:1)

mflt_run( 1A48 1A63) produced ( 1A48>820:0:0 1A63>858:1:1)

mflt_run( 1A43 1A76 1A63 1A60 1A36) produced ( 1A43>815:0:1
1A76>890:0:1 1A63>858:2:4 0000>952:2:4) 

mflt_run( 1A36 1A63) produced ( 1A36>800:0:0 1A63>858:1:1)

mflt_run( 1A23 1A63 1A74) produced ( 1A23>777:0:0 0000>859:1:2)

mflt_run( 1A26) produced ( 1A26>780:0:0)

mflt_run( 1A48 1A62) produced ( 1A48>820:0:1 1A62>853:0:1)

mflt_run( 1A2C 1A60 1A2C 1A63) produced ( 0000>789:0:2 1A63>858:3:3)

mflt_run( 1A43 1A60 1A76 1A2F) produced ( 1A43>815:0:3 1A76>890:0:3
0000>941:0:3) 

mflt_run( 1A2E 1A60) produced ( 1A2E>792:0:1 1A60>851:0:1)

mflt_run( 1A33 1A6B 1A60 1A75 1A36) produced ( 1A33>797:0:4
1A6B>868:0:4 1A75>889:0:4 0000>952:0:4) 

mflt_run( 1A20 1A6B 1A76 1A60 1A2F) produced ( 1A20>774:0:4
1A6B>868:0:4 1A76>890:0:4 0000>941:0:4)

mflt_run( 1A3F 1A65 1A60 1A37) produced ( 1A3F>811:0:1 1A65>862:0:1
0000>953:2:3)

The shaping of the following, with vowels or MEDIAL RA that should be
rendered before the consonant, was rejected:

mflt_run( 1A3E 1A6E 1A6C 1A65) produced ( 1A6E>872:1:1 1A3E>810:0:3
1A6C>869:0:3 1A65>862:0:3) 

mflt_run( 1A23 1A55) produced ( 1A55>835:1:1 1A23>777:0:0)

mflt_run( 1A32 1A71) produced ( 1A71>875:1:1 1A32>796:0:0)

The problem is that the first glyph does not derive from the first
character.

The shaping of the following was rejected:

mflt_run( 1A20 1A60 1A75 1A37) produced ( 1A20>774:0:2 1A75>889:0:2
0000>953:1:3)

In this case, character 2 is stacked below character 0,
and characters 1 and 3 combine to form a spacing glyph.

mflt_run( 1A20 1A62 1A60 1A75 1A37) produced ( 1A20>774:0:1
1A62>853:0:3 1A75>889:0:3 0000>953:2:4)

Character 1 is mounted on character 0, and character 3 on character 1.
Characters 2 and 4 combine to form a spacing glyph.  

mflt_run( 1A36 1A66 1A75 1A60 1A37) produced ( 1A36>800:0:1
1A66>863:0:2 1A75>889:0:2 0000>953:3:4)

Character 1 is mounted on character 0. and character 2 on character 1.
Characters 3 and 4 form a spacing glyph.

There does appear to be a work around, which is to have m17n declare
the orthographic syllables it receives to be 'grapheme clusters'.  It
solves at least some of the problems above.  However, it then makes
editing of the 'clusters' more difficult.  Note that there are examples
above with 5 characters in a cluster, and this is by no means the limit.

Richard.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]