[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [emacs-bidi] bidi categories
From: |
Alex Schroeder |
Subject: |
Re: [emacs-bidi] bidi categories |
Date: |
Thu, 08 Nov 2001 18:46:16 +0100 |
User-agent: |
Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.1 (i686-pc-linux-gnu) |
"Eli Zaretskii" <address@hidden> writes:
> Sorry, I don't see the problem. If we forget that winvert-list
> exists, do you see any special problems with handling a newline?
No, you are right.
> This is not the UAX#9 classification. I think you should try to work
> with what UAX#9 defines, and only add more classes if needed.
Hm, what hebeng.el does is it gets the types of all chars in a string
-- ie. some are L, some are R, and some are various forms of
"undecided". hebeng.el then does some context sensitive manipulation
in order to get it right.
> The data structure to hold this should probably be a char-table of
> some kind, since a string that Ehud used is not an efficient storage
> for large sparse arrays. It is good for a small contiguous set of
> characters, such as Hebrew, but doesn't scale up well if you add
> Arabic and other bidi scripts.
I've written code based on categories. I think this will be
adequate. See below.
Alex.
(defun get-bidi-type (char)
"Return the bidi type of the given CHAR.
The resulting type is one of A, B, D, I, L, N, R, space or -. The
result must be resolved using context. See `bidi-setup-categories'."
(let ((category-set (char-category-set char)))
(cond ((aref category-set bidi-category-l) ?L)
((aref category-set bidi-category-r) ?R)
((aref category-set bidi-category-d) ?D)
((aref category-set bidi-category-n) ?N)
((aref category-set bidi-category-i) ?I)
((aref category-set bidi-category-a) ?A)
((aref category-set bidi-category-b) ?B)
((aref category-set bidi-category--) ?-)
;; FIXME: Instead of this default we should fix
;; bidi-setup-categories
(t ?L))))
;; (format "%c" (get-bidi-type ?a))
;; (format "%c" (get-bidi-type ?ג))
;;; L/R categories.
;; This is modelled after characters.el. At the moment, however, we
;; don't have categories assigned, so we must create them ourselves.
;; The new categories are identified by a character, like all other
;; categories. We store them in the following variables.
;; The existing categories and syntax tables are not enough to resolve
;; bidi issues: Some of these categories specify that the "real"
;; category must be determined from context.
(defvar bidi-category-l nil
"Bidi L: LTR character (Latin, etc.).")
(defvar bidi-category-r nil
"Bidi R: RTL character (Hebrew, Arabic, etc.).")
(defvar bidi-category-d nil
"Bidi D: Digit.")
(defvar bidi-category-n nil
"Bidi N: Digit if near a digit.")
(defvar bidi-category-i nil
"Bidi I: Digit if between digits (inter-digit).")
(defvar bidi-category-a nil
"Bidi A: L or R if near one of them, else resolve as I.")
(defvar bidi-category-b nil
"Bidi B: L or R if near one of them, else resolve as N.")
(defvar bidi-category-- nil
"Bidi -: Neutral.")
(defvar bidi-categories
'(bidi-category-l
bidi-category-r
bidi-category-d
bidi-category-n
bidi-category-i
bidi-category-a
bidi-category-b
bidi-category--)
"List of categories used by bidi algorithms.")
;; This categorization is originally from hebeng.el by Ehud karni
;; <address@hidden>. The original hebeng.el table
;; contained fall-back categories for old MSDOS hebrew charsets.
(defun bidi-setup-categories ()
"Create new categories for bidi.
The categories will be named bidi-category-X where X is one of the
following:
L LTR character (Latin, etc.)
R RTL character (Hebrew, Arabic, etc.)
D Digit
Types to be resolved using context:
N Digit if near a digit
I Digit if between digits (inter-digit)
A L or R if near one of them, else resolve as I
If between R & L characters, use the same type as the leftside
character.
B L or R if near one of them, else resolve as N
- Neutral"
;; (setq table (standard-category-table))
(let ((table (standard-category-table)))
(mapcar (lambda (var)
(let ((cat (get-unused-category table))
(doc (get var 'variable-documentation)))
(when (symbol-value var)
(error "%S is already set" var))
(unless cat
(error "No more unused categories available"))
(set var cat)
(define-category cat doc table)))
bidi-categories)
;; ASCII
(mapcar (lambda (char)
(modify-category-entry char bidi-category-l table t))
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-d table))
"0123456789")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-n table))
"#$%+")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-i table))
"(),.<>^")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-a table))
"!\"'*:=`~")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-b table))
"&-/_")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-- table))
";address@hidden|}\177")
;; Arabic character set: All characters in these character sets are
;; written left-to-right. This is probably not true for
;; arabic-iso8859-6.
(let ((charsets '(arabic-iso8859-6
arabic-digit
arabic-1-column
arabic-2-column)))
(while charsets
(modify-category-entry (make-char (car charsets))
bidi-category-r table)
(setq charsets (cdr charsets))))
;; Hebrew character set (ISO-8859-8). Only some characters in this
;; character set are written left-to-right.
(mapcar (lambda (char)
(modify-category-entry char bidi-category-r table))
"אבגדהוזחטיךכלםמןנסעףפץצקרשת")))
--
http://www.emacswiki.org/
- Re: [emacs-bidi] Where do I start?, (continued)
- Re: [emacs-bidi] Where do I start?, Uwe Brauer, 2001/11/07
- Re: [emacs-bidi] Where do I start?, Alex Schroeder, 2001/11/07
- Re: [emacs-bidi] tables, Alex Schroeder, 2001/11/07
- Re: [emacs-bidi] Arabic Mule, Alex Schroeder, 2001/11/07
- Re: [emacs-bidi] Arabic Mule, Eli Zaretskii, 2001/11/07
- Re: [emacs-bidi] Where do I start?, Eli Zaretskii, 2001/11/07
- Re: [emacs-bidi] diacritics, ligatures, etc., Alex Schroeder, 2001/11/08
- Re: [emacs-bidi] diacritics, ligatures, etc., Eli Zaretskii, 2001/11/08
- Re: [emacs-bidi] diacritics, ligatures, etc., Yair Friedman (Jerusalem), 2001/11/12
- Re: [emacs-bidi] diacritics, ligatures, etc., Eli Zaretskii, 2001/11/12
- Re: [emacs-bidi] bidi categories,
Alex Schroeder <=
- Re: [emacs-bidi] bidi categories, Eli Zaretskii, 2001/11/08
- Re: [emacs-bidi] bidi categories, Alex Schroeder, 2001/11/09
- Re: [emacs-bidi] bidi categories, Alex Schroeder, 2001/11/09
- Re: [emacs-bidi] bidi categories, Eli Zaretskii, 2001/11/09
- Re: [emacs-bidi] bidi categories, derived from Unicode data, Alex Schroeder, 2001/11/09
- Re: [emacs-bidi] bidi categories, derived from Unicode data, Eli Zaretskii, 2001/11/10
- Re: [emacs-bidi] bidi categories, derived from Unicode data, Alex Schroeder, 2001/11/10
- Re: [emacs-bidi] bidi categories, Ehud Karni, 2001/11/10
- Re: [emacs-bidi] bidi categories, Alex Schroeder, 2001/11/10
- Re: [emacs-bidi] bidi categories, Ehud Karni, 2001/11/12