[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] bidi categories

From: Alex Schroeder
Subject: Re: [emacs-bidi] bidi categories
Date: Thu, 08 Nov 2001 18:46:16 +0100
User-agent: Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.1 (i686-pc-linux-gnu)

"Eli Zaretskii" <address@hidden> writes:

> Sorry, I don't see the problem.  If we forget that winvert-list
> exists, do you see any special problems with handling a newline?

No, you are right.

> This is not the UAX#9 classification.  I think you should try to work
> with what UAX#9 defines, and only add more classes if needed.

Hm, what hebeng.el does is it gets the types of all chars in a string
-- ie. some are L, some are R, and some are various forms of
"undecided".  hebeng.el then does some context sensitive manipulation
in order to get it right.

> The data structure to hold this should probably be a char-table of
> some kind, since a string that Ehud used is not an efficient storage
> for large sparse arrays.  It is good for a small contiguous set of
> characters, such as Hebrew, but doesn't scale up well if you add
> Arabic and other bidi scripts.

I've written code based on categories.  I think this will be
adequate.  See below.


(defun get-bidi-type (char)
  "Return the bidi type of the given CHAR.
The resulting type is one of A, B, D, I, L, N, R, space or -.  The
result must be resolved using context.  See `bidi-setup-categories'."
  (let ((category-set (char-category-set char)))
    (cond ((aref category-set bidi-category-l) ?L)
          ((aref category-set bidi-category-r) ?R)
          ((aref category-set bidi-category-d) ?D)
          ((aref category-set bidi-category-n) ?N)
          ((aref category-set bidi-category-i) ?I)
          ((aref category-set bidi-category-a) ?A)
          ((aref category-set bidi-category-b) ?B)
          ((aref category-set bidi-category--) ?-)
          ;; FIXME: Instead of this default we should fix
          ;; bidi-setup-categories
          (t ?L))))
;; (format "%c" (get-bidi-type ?a))
;; (format "%c" (get-bidi-type ?ג))

;;; L/R categories.

;; This is modelled after characters.el.  At the moment, however, we
;; don't have categories assigned, so we must create them ourselves.
;; The new categories are identified by a character, like all other
;; categories.  We store them in the following variables.

;; The existing categories and syntax tables are not enough to resolve
;; bidi issues: Some of these categories specify that the "real"
;; category must be determined from context.

(defvar bidi-category-l nil
  "Bidi L: LTR character (Latin, etc.).")
(defvar bidi-category-r nil
  "Bidi R: RTL character (Hebrew, Arabic, etc.).")
(defvar bidi-category-d nil
  "Bidi D: Digit.")
(defvar bidi-category-n nil
  "Bidi N: Digit if near a digit.")
(defvar bidi-category-i nil
  "Bidi I: Digit if between digits (inter-digit).")
(defvar bidi-category-a nil
  "Bidi A: L or R if near one of them, else resolve as I.")
(defvar bidi-category-b nil
  "Bidi B: L or R if near one of them, else resolve as N.")
(defvar bidi-category-- nil
  "Bidi -: Neutral.")
(defvar bidi-categories
  "List of categories used by bidi algorithms.")

;; This categorization is originally from hebeng.el by Ehud karni
;; <address@hidden>.  The original hebeng.el table
;; contained fall-back categories for old MSDOS hebrew charsets.

(defun bidi-setup-categories ()
  "Create new categories for bidi.

The categories will be named bidi-category-X where X is one of the

L   LTR character (Latin, etc.)
R   RTL character (Hebrew, Arabic, etc.)
D   Digit

Types to be resolved using context:

N   Digit if near a digit
I   Digit if between digits (inter-digit)
A   L or R if near one of them, else resolve as I
    If between R & L characters, use the same type as the leftside
B   L or R if near one of them, else resolve as N
-   Neutral"
  ;; (setq table (standard-category-table))
  (let ((table (standard-category-table)))
    (mapcar (lambda (var)
              (let ((cat (get-unused-category table))
                    (doc (get var 'variable-documentation)))
                (when (symbol-value var)
                  (error "%S is already set" var))
                (unless cat
                  (error "No more unused categories available"))
                (set var cat)
                (define-category cat doc table)))
    ;; ASCII
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-l table t))
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-d table))
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-n table))
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-i table))
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-a table))
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-b table))
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-- table))
    ;; Arabic character set: All characters in these character sets are
    ;; written left-to-right.  This is probably not true for
    ;; arabic-iso8859-6.
    (let ((charsets '(arabic-iso8859-6
      (while charsets
        (modify-category-entry (make-char (car charsets))
                               bidi-category-r table)
        (setq charsets (cdr charsets))))
    ;; Hebrew character set (ISO-8859-8).  Only some characters in this
    ;; character set are written left-to-right.
    (mapcar (lambda (char)
              (modify-category-entry char bidi-category-r table))


reply via email to

[Prev in Thread] Current Thread [Next in Thread]