[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about isearch

From: Per Starbäck
Subject: Re: Questions about isearch
Date: Thu, 26 Nov 2015 21:46:49 +0100

> IMO, it is more important to have language-independent matching in
> Emacs.  Language-specific rules are also needed in some situations,
> but they are secondary for Emacs.
>> It seems to me that we want to introduce a concept of current language

Yes! The language of a buffer is something I have wished for a long
long time, probably using minor modes. It has primarily been to have
the correct ispell dictionary and to have different abbrevs depending
on language.

With the new search folding it is much more needed.

> It's a problematic concept for Emacs, which is a multi-lingual
> environment.  For example, what is the "current language" of the
> buffer showing this message?

It's in English.

>  It cannot be US English, since it
> includes characters not in that language, and can easily include
> Turkish words.  Or consider the etc/HELLO file.

I don't understand at all what you are saying here. Yes, of course
Turkish words (and any character) can be in an English text. That
doesn't make it false that it is in English. Do you just mean that it
can be hard do determine the language of a text automatically?

> We could probably have a text property which will specify the
> language, but we don't have good means to set such a property.  IOW,
> where that information would come from?

I don't envision a text property, but just a value for the buffer,
because it is much easier and good enough for most things. Yes, there
are situations where you might want to differentiate it like that, but
that goes for other things we have in modes as well. (It would
sometimes be nice to get Javascript mode for part of an HTML file

So from where do we get it? Normally from the user. Many users mostly
write in a few languages, like Swedish and English to take myself as
an example. What I want is an indication "en" or "sv" somewhere in the
information line and commands to toggle between my favourite

Sometimes it can be determined automatically. For example when opening
a html file Emacs could look at the "lang" attribute, in a LaTeX file
it could see how you use packages like Babel or Polyglossia. And in
any text file various methods (like n-gram frequencies) can be used to
try to identify the language automatically.

I think the focus should be on buffers being able to have a (natural)
language, and commands to change that. It would be quite sufficient
 * a setting listing what languages I normally want to use (the first
one being the default)
 * a cycling command that sets the language to the next in that list
(that is a toggle when you have a two-list)
 * a command to explicitly set any valid value

Anything else can be done a lot later, and as experiments outside of
the core. Automatic detection is neat, but not really needed. And
exactly what changes the different languages need to do will be
determined part by part by time in different language communities. The
important thing is that there is some hook to hang your code on.

* Why it is so important, now with the new search folding *

For Scandinavians it is really important, because (with Swedish as
example) åäö are really totally their own letters in the Swedish
alphabet, regardless of their historic origin. To have a search for
"varpa" in a Swedish text find "värpa" or "varpå" would be just wrong.
It would give a strong impression of this being an American program
not meant to be used for Swedish.

An analogue would be finding "jamb" when looking for "iamb" in
English, where I and J are totally different letters, even though they
originally (in Latin) were the same. Or you start an isearch for
"valid" and after the first four letters you are inside "dualism". (U
and V also were the same letter originally.) Confusing and irritating,
and something to make people turn off this search folding which would
be sad, because it's a nice thing to have.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]