emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[emacs-bidi] Arabic implementation


From: TAKAHASHI Naoto
Subject: [emacs-bidi] Arabic implementation
Date: Fri, 9 Nov 2001 16:06:45 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1 (sparc-sun-solaris2.8) MULE/5.0 (SAKAKI)

Eli Zaretskii writes:

>> > No, I meant the code that supports Arabic presentation forms.
>> 
>> Shall we discus it here, in address@hidden

> That's the closest forum I could imagine to support of Arabic in Emacs.

OK, here it goes.  Our strategy is based on the two Emacs-21's
features below.

1. Font-lock checks the content of the buffer and if it finds a
   certain pattern, it changes the appearance of the string on the
   fly.  This is very close to what we need for Arabic.

2. We can display a composed character with an arbitrary character.
   For example, (compose-region (point) (+ (point) 2) ?*) composes the
   following two characters and displays an asterisk instead of
   overstricking the original two characters.  Note that if we do
   (compose-region (point) (1+ (point)) ?*) only the directly
   following character is displayed with an asterisk.

Now the procedure.

First, we advised font-lock-fontify-region so that it looks for a
character that has the new special category `?' (composition).  All
Arabic characters that require glyph selection have this category.

If such a character is found, the function predefined for that
character is called to compose that character.  In our case, a
function named arabic-compose-region is invoked.  This function does
roughly the followings.

  case 1: preceding-char is Arabic && following-char is Arabic
  -> Compose the found character to use its word medium form
     for display

  case 2: preceding-char isn't Arabic && following-char is Arabic
  -> ... word initial form ...

  case 3: preceding-char is Arabic && following-char isn't Arabic
  -> ... word final form ...

  case 4: preceding-char isn't Arabic && following-char isn't Arabic
  -> ... isolated form ...

(Of course you have to do more in the real life.  Some Arabic
characters are never connected, even in the middle of a word, to the
following character; the sequence laam-alef need to be displayed with
a special ligature; you have to handle diacritical marks, etc.)

All necessary Arabic glyph variants (called presentation forms) falls
into the range of mule-unicode-e000-ffff, so we created necessary
fonts.  Thus the buffer contains Arabic characters that belong to the
mule-unicode-0100-24ff charset but displayed with the
mule-unicode-e000-ffff charset.

And we wrote a quail package for Arabic.

You can see a screendump at http://www.m17n.org/ntakahas/arabic.png

-- 
TAKAHASHI Naoto
address@hidden
http://www.m17n.org/ntakahas/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]