emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about isearch


From: Eli Zaretskii
Subject: Re: Questions about isearch
Date: Thu, 26 Nov 2015 18:22:48 +0200

> From: Richard Stallman <address@hidden>
> Date: Thu, 26 Nov 2015 09:46:09 -0500
> Cc: address@hidden
> 
> It seems that perhaps we need these correspondences to depend
> on the language in use.
> 
> That's true for case conversion as well.  For instance the way
> to upcase 'i' is 'I' in most languages, but in Turkish it's a
> character I can't find a way to enter in Emacs.

(That character is, ─░, U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE.)

IMO, it is more important to have language-independent matching in
Emacs.  Language-specific rules are also needed in some situations,
but they are secondary for Emacs.

> It seems to me that we want to introduce a concept of current language

It's a problematic concept for Emacs, which is a multi-lingual
environment.  For example, what is the "current language" of the
buffer showing this message?  It cannot be US English, since it
includes characters not in that language, and can easily include
Turkish words.  Or consider the etc/HELLO file.

We could probably have a text property which will specify the
language, but we don't have good means to set such a property.  IOW,
where that information would come from?

> which would control these things, and also the language for spell checking,
> and maybe some other things.

Actually, modern spell-checkers can support multiple languages in the
same spell-checking job (in a nutshell, they check dictionaries for
each language they were told to use).

In any case, a spell-checker has a simpler job in this respect: it
checks one word at a time, so all it needs is the language for that
one word.  Conceptually, this is much simpler than what Emacs needs.

> In some cases, the current language is determined by which characters
> appear.  That would work fine for scripts that are used for just one
> language.  It would be hard to do that for Latin scripts, though.
> For latin scripts one might always have to specify it explicitly,
> but it could be specified by a file local variable or other such
> per-file customization mechanism.

We already know which script each character belongs to:

  (aref char-script-table ?a) => latin

But, as you say, this only rarely helps to deduce the language.

> The language environment, which already exists, is something
> different.  It controls how to recognize character codings, and
> therefore has to be global.  The current language should be per-buffer
> and perhaps should vary between parts of a buffer.  So they can't
> be the same thing.

Indeed.  But defining the current language of a buffer isn't
sufficient, either, for Emacs.

For that reason, we generally provide language-agnostic sorting,
searching, etc.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]