[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20140: 24.4; M17n shaper output rejected

From: K. Handa
Subject: bug#20140: 24.4; M17n shaper output rejected
Date: Wed, 25 Mar 2015 23:25:54 +0900

Hi, thank you for the detailed explanation.

In article <address@hidden>, Richard Wordingham <address@hidden> writes:

> What I ought to want is SIL's split cursor scheme, which indicated the
> next ('point') and previous characters, even in bidirectional text.
> Unfortunately, that's not compatible with m17n, which seems to assume
> that cursor position will be a single number.  The Emacs functions
> forward-char-intrusive and backward-char-intrusive provided a pleasant,
> more intuitive, alternative, and I am sad to hear they are gone.
> Perhaps I'll have to start using toggle-auto-composition.

Those Emacs functions are just my idea for improving Emacs
for CTL users, and have never been included in the official
Emacs verison.  I check the code and found two problems:

(1) When the command sets disable-point-adjustment to t,
command_loop_1 should force updating the display if point is
within a grapheme cluster.  So we need this patch:

diff --git a/src/keyboard.c b/src/keyboard.c
index bf65df1..13125c1 100644
--- a/src/keyboard.c
+++ b/src/keyboard.c
@@ -1636,6 +1636,16 @@ command_loop_1 (void)
            adjust_point_for_property (last_point_position,
                                       MODIFF != prev_modiff);
+      else if (current_buffer == prev_buffer
+              && last_point_position != PT)
+       {
+         if (PT > BEGV && PT < ZV
+             && (composition_adjust_point (last_point_position, PT) != PT))
+           /* Now point is within a grapheme cluster.  We must update
+              the display so that this cluster is discomosed on the
+              screen and the cursor is correctly placed at point.  */
+           windows_or_buffers_changed = 22;
+       }
       /* Install chars successfully executed in kbd macro.  */
(2) We should break a grapheme cluster at point.  So we need
this patch.

diff --git a/src/xdisp.c b/src/xdisp.c
index a17f5a9..0c56395 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -3408,6 +3408,9 @@ compute_stop_pos (struct it *it)
       pos = next_overlay_change (charpos);
       if (pos < it->stop_charpos)
        it->stop_charpos = pos;
+      /* If point is in front of the current stop pos, stop there.  */
+      if (charpos < PT && PT < it->stop_charpos)
+       it->stop_charpos = PT;
       /* Set up variables for computing the stop position from text
          property changes.  */
@@ -8166,7 +8169,12 @@ next_element_from_buffer (struct it *it)
          && IT_CHARPOS (*it) >= it->redisplay_end_trigger_charpos)
        run_redisplay_end_trigger_hook (it);
-      stop = it->bidi_it.scan_dir < 0 ? -1 : it->end_charpos;
+      /* Set stop position considering the bidi direction and point.  */
+      if (it->bidi_it.scan_dir < 0)
+       stop = (PT < IT_CHARPOS (*it)) ? PT : -1;
+      else
+       stop = ((IT_CHARPOS (*it) < PT && PT < it->end_charpos)
+               ? PT : it->end_charpos);
       if (CHAR_COMPOSED_P (it, IT_CHARPOS (*it), IT_BYTEPOS (*it),
          && next_element_from_composition (it))

Could you try these patches and test the usability of
forward-char-intrusive and backward-char-intrusive?

> > Please try to move cursor over this Devanagri text "हिंदी" on
> > Emacs, gedit, and, for instance, firefox.  They all treat
> > that text as 2 grapheme clusters "हिं" and "दी".  The first
> > one corresponds to character the sequence U+935 U+93F, and
> > U+93F (vowel I) is displayed before U+935 (base cosonant).

> Note that those clusters are only 3 and 2 characters long.  Retyping
> them is tolerable.  Now consider the Sanskrit Devanagari text स्त्री,
> which contains two consonant-combining viramas.  Emacs moves across it
> in 1 step, but Claws e-mail (GTK-based, I believe) and LibreOffice
> (HarfBuzz-based, at least for linux) both take 3 steps to move across
> it.  Claws and LibreOffice use different algorithms to position the
> cursor.  That of LibreOffice seems more reasonable, but that of
> Claws works better!  The reason is that Unicode did not declare virama
> as forming grapheme clusters.

Ah, hmmm, that a problem of DEVA-OTF.flt and DEV2-OTF.flt of
the m17n library.  I'll try to fix them.

> It seems to have solved all of them.  When I reported the bug, I was
> having problems with my font because libotf was silently ignoring half
> the lookups in my font.

Could you please send me (not on this list) an appropriate
bug/problem report if libotf should be fixed?

> I though I might have problems with U+1A58 TAI THAM SIGN MAI KANG LAI,
> which in Lao visually groups (usually) with the following base
> consonant and in Tai Khuen groups with the preceding base consonant. My
> clustering in Emacs follows the Tai Khuen scheme.  (I compose two
> orthographic clusters together in Emacs, but declare two grapheme
> clusters in the FLT processing.)  However, my font follows a major
> Northern Thai dictionary and places it on the following base consonant
> if there is nothing above it, but otherwise places it on the preceding
> base consonant.  However, my implementation is too dirty to cause
> problems - the second cluster is not reported as deriving from the
> mai kang lai character.

> I wonder, though, what will happen if I manage to implement the
> Universal Shaping Engine's (USE) rphf feature. The author of a Lao-style
> Tai Tham font wanted this feature in HarfBuzz.  The desired effect seems
> easy to achieve in m17n-flt, but placing it under font control is more
> difficult.  I'm studying MLM2-OTF.flt to see how to do it.

I've just started to study the Universal Shaping Engine.  It
seems that we can implement it by a proper FLT file.

> > > However, it then makes editing of the 'clusters' more
> > > difficult.  Note that there are examples above with 5
> > > characters in a cluster, and this is by no means the
> > > limit.
> > 
> > But, it seems that the current behavior is accepted, at
> > least, by Indic people.

> Who do you mean by 'Indic people'?

I just mean that I have not heard any complaints about that
"too long cluster problem" of Emacs.  No one is using Emacs
for Indic scripts?

> New Tai Lue is an interesting case.  Microsoft delayed support for this
> simple Indic script for so long that most apparently Unicode-encoded
> New Tai Lue text was actually encoded in visual order.  With Unicode
> 8.0, New Tai Lue is changing from phonetic order to visual order, and
> it will no longer need any clusters at all!  

Wow, I didn't know that.

> Emacs 23.3 (which is what is in long-term support Ubuntu
> 12.04) offers no support for New Tai Lue, so I am not sure
> that there is yet a New Tai Lue view on composition in
> Emacs.

We may be able to provide supports for new scripts in elpa.

K. Handa

reply via email to

[Prev in Thread] Current Thread [Next in Thread]