On Fri, Aug 27, 2010 at 12:56 PM, Eli Zaretskii
<address@hidden> wrote:
> From: Kenichi Handa <address@hidden>
> Date: Thu, 26 Aug 2010 10:10:05 +0900
>
> I've just committed changes to trunk for Arabic shaping. If
> there're any Arabic users in this list, please check the
> displaying of Arabic text. On GNU/Linux system, you must
> compile Emacs with libotf and m17n-lib (configure script
> should detect them automatically).
Thanks. However, today's build behaves very strangely in a GUI
session on MS-Windows. For starters, cursor motion seems to jump
across many characters in the "Arabic" line of etc/HELLO. For
example, typing C-f in that line, I first move one character at a time
across "Arabic", as expected, then the cursor jumps to the right paren
of the leftmost parenthesized part, again as expected, and then I see
the following strange behavior:
. C-f moves one character to the left, to buffer position 758, as
expected.
. the next C-f jumps across many characters on the screen and lands
on position 764.
. another C-f jumps to what is reported as position 765, but on the
screen those are several characters, maybe 5 or 6.
. another C-f moves to the left paren at position 766, as expected.
. yet another C-f moves to position 767, but on the screen the
cursor jumps back into one of the characters it jumped across when
it landed on position 765 two C-f keypresses earlier.
. if I type C-b 4 times from this point, I enter a "trap", whereby
typing C-b jumps between two characters, whose buffer positions
are 764 and 765. The only way to get out of the trap is with C-a
or C-e or C-f.
I don't read Arabic, so I cannot really say whether any of this is
expected behavior. (The "trap" with C-b is certainly not the expected
behavior.) Do you see anything similar on X?
1) I confirm that Arabic shaping seems to work fine on my build (27/8/10 rev. 101200, on Linux+X (Debian unstable)).
2) Logical movement with C-f/C-b in the hello file seems fine (I do not see the trap described above).
3) My Arabic is very basic, and I am not familiar with Arabic computing (keyboards etc.) - I noticed the following points, but I am not sure what is the expected behavior (I can only compare to other programs - gedit in this case):
a) Column numbers (column-number-mode) behave strangely (I suspect that m17n-lib's invisible markup consume column numbers). For example as you move using C-f in the word "هذا" column numbers go through "0,1,4,5" (i.e. the second character takes up 3 columns). If I change that to "بهذا", the column positions are "0,1,4,6,7" (the second and third chars take up 3 and 2 columns resp.?).
In gedit column positions are 1 character per column and do not depend on the shaping.
b) Arabic keyboard has the ligature "Lam-Alef" (U+FEFB) on the key marked "B" in qwerty keyboards. When I type this in emacs, I get Lam and Alef (which are auto-shaped correctly as the proper ligature). C-d when cursor is on the ligature erases the Alef and another C-d erases the Lam. This seems like proper behavior to me. However, in gedit, the "B" key produces a (U+FEFB) which is always displayed as a ligature, deleted in a single Del press, and never connected to previous character. Cut and pasting this into emacs, I get a similar behavior there.
The question is: do Arabic users expect to be able to produce this "stiff" ligature? Is the behavior of gedit a bug? Should the emacs "Lam-Alef" key behave as it does (i.e. produce two characters)?
thanks,
Amit Aronovitch