[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to translate LaTeX into UTF-8 in Elisp?
From: |
Héctor Lahoz |
Subject: |
Re: How to translate LaTeX into UTF-8 in Elisp? |
Date: |
Tue, 4 Jul 2017 12:23:48 +0200 |
User-agent: |
Mutt/1.5.20 (2009-06-14) |
Marcin Borkowski wrote:
> OK, so here is a proof of concept:
>
> --8<---------------cut here---------------start------------->8---
> (defvar TeX-to-Unicode-accents-alist
> '((?` . "grave")
> (?' . "acute")
> (?^ . "circumflex")
> (?\" . "diaeresis")
> (?H . "double acute")
> (?~ . "tilde")
> (?c . "with cedilla")
> (?k . "ogonek")
> (?= . "macron")
> (?. . "with dot above")
> (?u . "with breve")
> (?v . "with caron"))
> "A mapping from TeX control characters to accent names used in
> Unicode.")
>
> (defun combine-letter-diacritical-mark (letter mark)
> "Return a Unicode string of LETTER combined with MARK.
> MARK can be any character that can be used in TeX accenting
> commands."
> (let* ((letter (if (stringp letter)
> (string-to-char letter)
> letter))
> (uppercase (= letter
> (upcase letter))))
> (cdr (assoc-string
> (format "LATIN %s LETTER %c %s"
> (if uppercase "CAPITAL" "SMALL")
> letter
> (cdr (assoc mark TeX-to-Unicode-accents-alist)))
> ucs-names
> t))))
> --8<---------------cut here---------------end--------------->8---
>
Great.
Perhaps you could consider translating to unicode combining characters.
I think it is closer to the original TeX idea and could be cleaner:
0300;COMBINING GRAVE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING GRAVE;;;;
0301;COMBINING ACUTE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING ACUTE;;;;
0302;COMBINING CIRCUMFLEX ACCENT;Mn;230;NSM;;;;;N;NON-SPACING CIRCUMFLEX;;;;
0303;COMBINING TILDE;Mn;230;NSM;;;;;N;NON-SPACING TILDE;;;;
0304;COMBINING MACRON;Mn;230;NSM;;;;;N;NON-SPACING MACRON;;;;
0305;COMBINING OVERLINE;Mn;230;NSM;;;;;N;NON-SPACING OVERSCORE;;;;
0306;COMBINING BREVE;Mn;230;NSM;;;;;N;NON-SPACING BREVE;;;;
0307;COMBINING DOT ABOVE;Mn;230;NSM;;;;;N;NON-SPACING DOT ABOVE;;;;
0308;COMBINING DIAERESIS;Mn;230;NSM;;;;;N;NON-SPACING DIAERESIS;;;;
0309;COMBINING HOOK ABOVE;Mn;230;NSM;;;;;N;NON-SPACING HOOK ABOVE;;;;
030A;COMBINING RING ABOVE;Mn;230;NSM;;;;;N;NON-SPACING RING ABOVE;;;;
030B;COMBINING DOUBLE ACUTE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING DOUBLE ACUTE;;;;
030C;COMBINING CARON;Mn;230;NSM;;;;;N;NON-SPACING HACEK;;;;
030D;COMBINING VERTICAL LINE ABOVE;Mn;230;NSM;;;;;N;NON-SPACING VERTICAL LINE
ABOVE;;;;
See the wikipedia article on unicode equivalence:
https://en.wikipedia.org/wiki/Unicode_equivalence
The difference is that unicode reverses the order. First you have the
base character and then all combining characters. For example, \'a would
be translated to either
00E1;LATIN SMALL LETTER A WITH ACUTE
or
0061;LATIN SMALL LETTER A
0301;COMBINING ACUTE ACCENT
I don't know the implications of using unicode combining characters.
I guess the choice depends on the purpose of the output.
Re: How to translate LaTeX into UTF-8 in Elisp?, Teemu Likonen, 2017/07/03
- Re: How to translate LaTeX into UTF-8 in Elisp?, Marcin Borkowski, 2017/07/04
- Re: How to translate LaTeX into UTF-8 in Elisp?, Udyant Wig, 2017/07/04
- Re: How to translate LaTeX into UTF-8 in Elisp?, Thien-Thi Nguyen, 2017/07/04
- Re: How to translate LaTeX into UTF-8 in Elisp?, Emanuel Berg, 2017/07/04
- Re: How to translate LaTeX into UTF-8 in Elisp?, Udyant Wig, 2017/07/05
- Re: How to translate LaTeX into UTF-8 in Elisp?, Emanuel Berg, 2017/07/05
- Re: How to translate LaTeX into UTF-8 in Elisp?, Thien-Thi Nguyen, 2017/07/13
- Re: How to translate LaTeX into UTF-8 in Elisp?, Udyant Wig, 2017/07/13
Re: How to translate LaTeX into UTF-8 in Elisp?, Joost Kremers, 2017/07/04