[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about isearch

From: Eli Zaretskii
Subject: Re: Questions about isearch
Date: Thu, 26 Nov 2015 23:02:25 +0200

> Date: Thu, 26 Nov 2015 21:46:49 +0100
> From: Per Starb├Ąck <address@hidden>
> Cc: address@hidden, Eli Zaretskii <address@hidden>, address@hidden
> >  It cannot be US English, since it
> > includes characters not in that language, and can easily include
> > Turkish words.  Or consider the etc/HELLO file.
> I don't understand at all what you are saying here. Yes, of course
> Turkish words (and any character) can be in an English text. That
> doesn't make it false that it is in English. Do you just mean that it
> can be hard do determine the language of a text automatically?

So you will sort Turkish words in an otherwise English text according
to English rules?  And spell-check them using an English dictionary?
I don't think so.

A language attribute is something that should control how certain
linguistic operations are tailored.  You cannot use one language's
rules with words from another language.

So saying that an email message that is mostly in English, but
includes words and phrases from another language, is in English is not
useful, at least for handling the non-English parts of that message.

And what about etc/HELLO? what language is it in?  There are more
non-English words there than English words, and no language in
particular can claim it has the majority of the words, or even too
many to count as "many".  How do we treat such buffers? what rules of
character folding do we apply there?

> > We could probably have a text property which will specify the
> > language, but we don't have good means to set such a property.  IOW,
> > where that information would come from?
> I don't envision a text property, but just a value for the buffer,
> because it is much easier and good enough for most things. Yes, there
> are situations where you might want to differentiate it like that, but
> that goes for other things we have in modes as well. (It would
> sometimes be nice to get Javascript mode for part of an HTML file
> etc.)

Having Javascript in HTML just makes it highlighted wrongly.  That's
aesthetically bad (and there's a todo item to solve that problem), but
that's not fatal.  Trying to treat a word in Japanese according to
Latin rules is much worse.

So I think a per-buffer language attribute is the wrong way to go.  We
need a finer granularity.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]