[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: "Unidecode" functionality in Emacs
From: |
John Mastro |
Subject: |
Re: "Unidecode" functionality in Emacs |
Date: |
Tue, 20 Mar 2018 10:23:03 -0700 |
Eli Zaretskii <eliz@gnu.org> wrote:
>> There are "Unidecode" packages for Perl[1], Python[2], and Emacs[3]
>> (derived from one another in that order). They each transliterate
>> Unicode text to ASCII, e.g.:
>>
>> (unidecode "Déjà vu")
>> ;=> "Deja vu"
>> (unidecode "北亰")
>> ;=> "Bei Jing "
>>
>> Does Emacs have equivalent functionality built-in?
>
> It's possible to remove accents (the first example) using the
> functionality in ucs-normalize.el. Some transliteration is possible
> for scripts for which there exists a "transliteration" input method,
> using the code by Michael Welsh Duggan posted here:
>
> http://lists.gnu.org/archive/html/emacs-devel/2018-02/msg00387.html
>
> For example, you can transliterate Cyrillic text using the
> cyrillic-translit input method that comes with Emacs. But there are
> no general-purpose transliteration capabilities in Emacs, AFAIK.
Thanks, I'll take a look at those.
> However, it looks like the Perl package is just a huge database of
> precomputed transliterations, in which case doing the same in Emacs
> Lisp should be almost trivial.
Yep, that's how the Emacs package works too. It boils down to 25 lines
of Lisp[1] plus the database[2].
Thanks
John
[1]:
https://github.com/sindikat/unidecode/blob/5502ada9287b4012eabb879f12f5b0a9df52c5b7/unidecode.el#L56-L82
[2]:
https://github.com/sindikat/unidecode/tree/5502ada9287b4012eabb879f12f5b0a9df52c5b7/data