emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On language-dependent defaults for character-folding


From: Lars Ingebrigtsen
Subject: Re: On language-dependent defaults for character-folding
Date: Sat, 20 Feb 2016 17:31:48 +1100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux)

Elias Mårtenson <address@hidden> writes:

> Every example you have given so far discusses the decomposition
> equivalence. I.e. the fact that the who variants of ñ are the
> same. Section 5.16 discuss the _concept_ of allowing n and ñ match
> similarly but the mechanism to do so is locale-dependent. This is what
> Unicode says, and that is what I say.

Yes.

Here are my thoughts (I was sitting on a plane today):

It seems to me that we're considering using the Unicode decomposition
rules for "variant detection" because it's what we have.  But this
doesn't allow people to say `C-s l' to find ł or `C-s o' to find ø, and
this would obviously be something that many people would find helpful.

So the Unicode decomposition rules only get us halfway there.  On the
other hand, they go to far for other users, who absolutely do not want
`C-s o' to find ø, but would be really glad if `C-s hermes' would find
"Hermés" (or is it "Hermès"?  I can't even type that in on this
keyboard).

Emacs is awesome.  We should aim to make this extremely useful feature
awesome.

So: How many characters are we really talking about?  Unicode is big and
scary, but this only applies to alphabetical scripts, right?  That is,
all the Latin-like scripts, and...  possibly Greek/Hebrew/Cyrillic?  I
don't know?

But if we only consider the Latin scripts for a moment, there aren't
more than a few hundred Unicode points that we care about.  Basically
all the old iso-8859-foos from around Europe.  And what we want is a way
for people with normal keyboards (they have a-z in Latin alphabet
countries) to search for variants.

So: That sounds like an evening's work.

(defvar *character-variants*
  '((?a ?á ?å ?ä ...)
    (?o ?ø ?ö ?ó ...)
    ...))

Everything that somebody says "that's kinda an a, right?" goes on there.

Then we have something like:

(define-locale-execption :no ?a ?å)

There would be few of these exceptions per locale.  The Scandinavian
countries would have three each, and Denmark's and Norway's would be the
same.

That bit is more than an evening, but is something that people would
enjoy submitting exceptions to, I think.

And then we just look up the locale, create the mapping when we type
`C-s', and there we are.  An awesome, very useful feature that would
annoy nobody, and that should be on by default.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



reply via email to

[Prev in Thread] Current Thread [Next in Thread]