[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [emacs-bidi] bidi categories
From: |
Alex Schroeder |
Subject: |
Re: [emacs-bidi] bidi categories |
Date: |
Fri, 09 Nov 2001 19:28:14 +0100 |
User-agent: |
Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.1 (i686-pc-linux-gnu) |
Alex Schroeder <address@hidden> writes:
> "Eli Zaretskii" <address@hidden> writes:
>
>> I'd prefer the classification defined by UAX#9. It would be
>> confusing, I think, to have 2 different classifications.
>
> Yes, I agree. I wrote the above code before reading your comments. I
> will send a new version asap. :)
Here's what I have at the moment. As I said in another mail, I also
have the Unicode classification, but I don't know yet how to use it --
given a Unicode codepoint (correct term?), how do I get the
corresponding Mule character in the unicode charsets -- or better yet,
how do I get all characters from all the other charsets matching it?
Perhaps I can use the Tables Dave Love has sent to gnu.emacs.sources;
I will investigate.
I looked at UAX#9 again, and while it describes the process of
transforming logical to visual order, it a) looks horribly complicated
and b) not reversible. I guess that's old stuff for those who are
familiar with it... :) Since the reverse operation is not defined
explicitly (did I miss it?), a certain amount of guess-work will be
needed.
Alex.
;;; L/R categories.
;; This is modelled after characters.el. At the moment, however, we
;; don't have categories assigned, so we must create them ourselves.
;; The new categories are identified by a character, like all other
;; categories. We store them in the following variables.
;; The existing categories and syntax tables are not enough to resolve
;; bidi issues: Some of these categories specify that the "real"
;; category must be determined from context. See the Unicode Standard
;; Annex #9, available from http://www.unicode.org/unicode/reports/tr9/.
;; Note that not all categories mentioned in UAX#9 are listed -- perhaps
;; they will be added later.
(defvar bidi-category-l nil
"Strong Left-to-Right: Most alphabetic, syllabic, Han ideographic
characters, digits that are neither European nor Arabic, all unassigned
characters except in the ranges (0590-05FF, FB1D-FB4F) and (0600-07BF,
FB50-FDFF, FE70-FEFF).")
(defvar bidi-category-r nil
"Strong Right-to-Left: Hebrew alphabet, most punctuation specific to
that script, all unassigned characters in the ranges (0590-05FF,
FB1D-FB4F)")
(defvar bidi-category-al nil
"Strong Right-to-Left Arabic: Arabic, Thaana, and Syriac alphabets,
most punctuation specific to those scripts, all unassigned characters in
the ranges (0600-07BF, FB50-FDFF, FE70-FEFF).")
(defvar bidi-category-en nil
"Weak European Number: European digits, Eastern Arabic-Indic digits.")
(defvar bidi-category-es nil
"Weak European Number Separator: Solidus (Slash).")
(defvar bidi-category-et nil
"Weak European Number Terminator: Plus Sign, Minus Sign, Degree,
Currency symbols.")
(defvar bidi-category-an nil
"Weak Arabic Number: Arabic-Indic digits, Arabic decimal & thousands
separators.")
(defvar bidi-category-cs nil
"Weak Common Number Separator: Colon, Comma, Full Stop (Period),
Non-breaking space.")
(defvar bidi-category-s nil
"Neutral Segment Separator: Tab.")
(defvar bidi-category-ws nil
"Neutral Whitespace: Space, Figure Space, Line Separator, Form Feed,
General Punctuation Spaces")
(defvar bidi-categories
'(bidi-category-l
bidi-category-r
bidi-category-al
bidi-category-en
bidi-category-es
bidi-category-et
bidi-category-an
bidi-category-cs
bidi-category-s
bidi-category-ws)
"List of categories used by bidi algorithms.")
(defun bidi-setup-categories ()
"Create new categories for bidi according to UAX#9."
;; (setq table (standard-category-table))
(let ((table (standard-category-table)))
(mapcar (lambda (var)
(let ((cat (get-unused-category table))
(doc (get var 'variable-documentation)))
(when (symbol-value var)
(error "%S is already set" var))
(unless cat
(error "No more unused categories available"))
(set var cat)
(define-category cat doc table)))
bidi-categories)
;; ASCII: there are no characters of the categories R, AL and AN.
;; Lots of characters are still missing a classification, this will
;; be fixed using the Unicode tables.
(mapcar (lambda (char)
(modify-category-entry char bidi-category-l table t))
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-en table))
"0123456789")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-es table))
"/")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-et table))
"+-$")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-cs table))
":;,.")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-s table))
"\t")
(mapcar (lambda (char)
(modify-category-entry char bidi-category-ws table))
" \n\r\f")
;; Hebrew character set (ISO-8859-8). Only some characters in this
;; character set are written left-to-right.
(mapcar (lambda (char)
(modify-category-entry char bidi-category-r table))
"אבגדהוזחטיךכלםמןנסעףפץצקרשת")))
- Re: [emacs-bidi] Arabic Mule, (continued)
- Re: [emacs-bidi] Where do I start?, Eli Zaretskii, 2001/11/07
- Re: [emacs-bidi] diacritics, ligatures, etc., Alex Schroeder, 2001/11/08
- Re: [emacs-bidi] diacritics, ligatures, etc., Eli Zaretskii, 2001/11/08
- Re: [emacs-bidi] diacritics, ligatures, etc., Yair Friedman (Jerusalem), 2001/11/12
- Re: [emacs-bidi] diacritics, ligatures, etc., Eli Zaretskii, 2001/11/12
- Re: [emacs-bidi] bidi categories, Alex Schroeder, 2001/11/08
- Re: [emacs-bidi] bidi categories, Eli Zaretskii, 2001/11/08
- Re: [emacs-bidi] bidi categories, Alex Schroeder, 2001/11/09
- Re: [emacs-bidi] bidi categories,
Alex Schroeder <=
- Re: [emacs-bidi] bidi categories, Eli Zaretskii, 2001/11/09
- Re: [emacs-bidi] bidi categories, derived from Unicode data, Alex Schroeder, 2001/11/09
- Re: [emacs-bidi] bidi categories, derived from Unicode data, Eli Zaretskii, 2001/11/10
- Re: [emacs-bidi] bidi categories, derived from Unicode data, Alex Schroeder, 2001/11/10
- Re: [emacs-bidi] bidi categories, Ehud Karni, 2001/11/10
- Re: [emacs-bidi] bidi categories, Alex Schroeder, 2001/11/10
- Re: [emacs-bidi] bidi categories, Ehud Karni, 2001/11/12
- [emacs-bidi] improve visual-to-logical, Alex Schroeder, 2001/11/13
- Re: [emacs-bidi] improve visual-to-logical, Eli Zaretskii, 2001/11/13
- Re: [emacs-bidi] improve visual-to-logical, Alex Schroeder, 2001/11/13