Re: How to translate LaTeX into UTF-8 in Elisp?

help-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to translate LaTeX into UTF-8 in Elisp?

From:	Héctor Lahoz
Subject:	Re: How to translate LaTeX into UTF-8 in Elisp?
Date:	Tue, 4 Jul 2017 12:23:48 +0200
User-agent:	Mutt/1.5.20 (2009-06-14)

Marcin Borkowski wrote:
> OK, so here is a proof of concept:
> 
> --8<---------------cut here---------------start------------->8---
> (defvar TeX-to-Unicode-accents-alist
>   '((?` . "grave")
>     (?' . "acute")
>     (?^ . "circumflex")
>     (?\" . "diaeresis")
>     (?H . "double acute")
>     (?~ . "tilde")
>     (?c . "with cedilla")
>     (?k . "ogonek")
>     (?= . "macron")
>     (?. . "with dot above")
>     (?u . "with breve")
>     (?v . "with caron"))
>   "A mapping from TeX control characters to accent names used in
> Unicode.")
> 
> (defun combine-letter-diacritical-mark (letter mark)
>   "Return a Unicode string of LETTER combined with MARK.
> MARK can be any character that can be used in TeX accenting
> commands."
>   (let* ((letter (if (stringp letter)
>                      (string-to-char letter)
>                    letter))
>          (uppercase (= letter
>                        (upcase letter))))
>     (cdr (assoc-string
>           (format "LATIN %s LETTER %c %s"
>                   (if uppercase "CAPITAL" "SMALL")
>                   letter
>                   (cdr (assoc mark TeX-to-Unicode-accents-alist)))
>           ucs-names
>           t))))
> --8<---------------cut here---------------end--------------->8---
> 

Great.

Perhaps you could consider translating to unicode combining characters.
I think it is closer to the original TeX idea and could be cleaner:

0300;COMBINING GRAVE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING GRAVE;;;;
0301;COMBINING ACUTE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING ACUTE;;;;
0302;COMBINING CIRCUMFLEX ACCENT;Mn;230;NSM;;;;;N;NON-SPACING CIRCUMFLEX;;;;
0303;COMBINING TILDE;Mn;230;NSM;;;;;N;NON-SPACING TILDE;;;;
0304;COMBINING MACRON;Mn;230;NSM;;;;;N;NON-SPACING MACRON;;;;
0305;COMBINING OVERLINE;Mn;230;NSM;;;;;N;NON-SPACING OVERSCORE;;;;
0306;COMBINING BREVE;Mn;230;NSM;;;;;N;NON-SPACING BREVE;;;;
0307;COMBINING DOT ABOVE;Mn;230;NSM;;;;;N;NON-SPACING DOT ABOVE;;;;
0308;COMBINING DIAERESIS;Mn;230;NSM;;;;;N;NON-SPACING DIAERESIS;;;;
0309;COMBINING HOOK ABOVE;Mn;230;NSM;;;;;N;NON-SPACING HOOK ABOVE;;;;
030A;COMBINING RING ABOVE;Mn;230;NSM;;;;;N;NON-SPACING RING ABOVE;;;;
030B;COMBINING DOUBLE ACUTE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING DOUBLE ACUTE;;;;
030C;COMBINING CARON;Mn;230;NSM;;;;;N;NON-SPACING HACEK;;;;
030D;COMBINING VERTICAL LINE ABOVE;Mn;230;NSM;;;;;N;NON-SPACING VERTICAL LINE 
ABOVE;;;;

See the wikipedia article on unicode equivalence:
https://en.wikipedia.org/wiki/Unicode_equivalence

The difference is that unicode reverses the order. First you have the
base character and then all combining characters. For example, \'a would
be translated to either

00E1;LATIN SMALL LETTER A WITH ACUTE

or

0061;LATIN SMALL LETTER A
0301;COMBINING ACUTE ACCENT

I don't know the implications of using unicode combining characters.
I guess the choice depends on the purpose of the output.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: How to translate LaTeX into UTF-8 in Elisp?, Marcin Borkowski, 2017/07/03
- Re: How to translate LaTeX into UTF-8 in Elisp?, Emanuel Berg, 2017/07/03
  - Re: How to translate LaTeX into UTF-8 in Elisp?, Marcin Borkowski, 2017/07/03
    - Re: How to translate LaTeX into UTF-8 in Elisp?, tomas, 2017/07/03
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Marcin Borkowski, 2017/07/04
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Emanuel Berg, 2017/07/03
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Marcin Borkowski, 2017/07/03
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Emanuel Berg, 2017/07/03
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Héctor Lahoz <=
- Re: How to translate LaTeX into UTF-8 in Elisp?, Teemu Likonen, 2017/07/03
  - Re: How to translate LaTeX into UTF-8 in Elisp?, Marcin Borkowski, 2017/07/04
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Udyant Wig, 2017/07/04
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Thien-Thi Nguyen, 2017/07/04
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Emanuel Berg, 2017/07/04
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Udyant Wig, 2017/07/05
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Emanuel Berg, 2017/07/05
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Thien-Thi Nguyen, 2017/07/13
    - Re: How to translate LaTeX into UTF-8 in Elisp?, Udyant Wig, 2017/07/13
- Re: How to translate LaTeX into UTF-8 in Elisp?, Joost Kremers, 2017/07/04

Prev by Date: Re: Emacs user manual in Spanish
Next by Date: Re: How to translate LaTeX into UTF-8 in Elisp?
Previous by thread: Re: How to translate LaTeX into UTF-8 in Elisp?
Next by thread: Re: How to translate LaTeX into UTF-8 in Elisp?
Index(es):
- Date
- Thread