[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [emacs-bidi] detecting the wrong order of characters
From: |
Alex Schroeder |
Subject: |
Re: [emacs-bidi] detecting the wrong order of characters |
Date: |
Wed, 07 Nov 2001 16:13:24 +0100 |
User-agent: |
Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.1 (i686-pc-linux-gnu) |
Eli Zaretskii <address@hidden> writes:
>> if you can read Hebrew postscript, check this address out
>> http://www.cs.huji.ac.il/labs/learning/Info/Ps_Files/lecture2.ps
>> but think of orders pairs of letter. with SOFIOT in hebrew, this should
>> work very nicely.
What does the above document say, exactly?
> Is something like that possible with Arabic?
I don't know how Arab is represented in Unicode. Picking up my book,
however, I see that there are often three ways of writing a letter --
one for the beginning of words, one for within words, one for the end
of words. Are these represented as different characters in Unicode
(or Latin 6)? If so, then that information could be used. If not,
language specific constellations must be looked for. :(
This reminds me of the very simple language detection scheme I am
using. A similar technique might also work to detect visual order --
all we need is a list of common direction-identifying sequences.
(defvar guess-language-rules
'(("en" . "\\<\\(of\\|the\\|and\\|or\\|how\\)\\>")
("de" . "\\<\\(und\\|oder\\|der\\|die\\|das\\|wie\\)\\>")
("fr" . "\\<\\(et\\|ou\\|[ld]es\\|que\\)\\>"))
"Alist of rules to determine the language of some text.
Each rule has the form (CODE . REGEXP) where CODE is a string to
identify the language (probably according to ISO 639), and REGEXP is a
regexp that matches some very common words particular to that language.
The default language should be listed first. That will be the language
returned when no REGEXP matches, as would happen for an empty
document.")
(defun guess-buffer-language ()
"Guess language in the current buffer.
Adapted by Alex.
From: Jean-Philippe Theberge <address@hidden>
Subject: Re: Guessing a language.
Newsgroups: gnu.emacs.help
Date: 03 Mar 2000 16:46:41 +0100"
(save-excursion
(goto-char (point-min))
(let ((count (map 'list (lambda (x)
(cons (string-to-number
(count-matches (cdr x))) (car x)))
guess-language-rules)))
(cdr (assoc (car (sort (map 'list 'car count) '>))
count)))))
(defun guess-language ()
"Guess language in the current buffer."
(interactive)
(message (guess-buffer-language)))
Alex.
--
http://www.emacswiki.org/
- Re: [emacs-bidi] Display routines, (continued)
- Re: [emacs-bidi] Display routines, Eli Zaretskii, 2001/11/06
- Re: [emacs-bidi] Display routines, Ehud Karni, 2001/11/06
- Re: [emacs-bidi] Display routines, Matan Ninio, 2001/11/06
- Re: [emacs-bidi] Display routines, Eli Zaretskii, 2001/11/07
- Re: [emacs-bidi] Display routines, Matan Ninio, 2001/11/07
- Re: [emacs-bidi] Display routines, Ehud Karni, 2001/11/07
- Re: [emacs-bidi] detecting the wrong order of characters,
Alex Schroeder <=
- Re: [emacs-bidi] Display routines, Eli Zaretskii, 2001/11/06
- Re: [emacs-bidi] Display routines, Eli Zaretskii, 2001/11/07
- Re: [emacs-bidi] Display routines, Michael Welsh Duggan, 2001/11/06
- Re: [emacs-bidi] Display routines, Eli Zaretskii, 2001/11/06
- Re: [emacs-bidi] Display routines, Uwe Brauer, 2001/11/06
- [emacs-bidi] Re: bidi prototype in elisp, Eli Zaretskii, 2001/11/06
- [emacs-bidi] Re: bidi prototype in elisp, Alex Schroeder, 2001/11/06
- [emacs-bidi] Re: bidi prototype in elisp, Eli Zaretskii, 2001/11/06
- Re: [emacs-bidi] Re: bidi prototype in elisp, Ehud Karni, 2001/11/06
- Message not available
- Re: [emacs-bidi] Re: bidi prototype in elisp, Eli Zaretskii, 2001/11/06