[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Spellcheck against multiple dictionaries?

From: Sergei
Subject: Re: Spellcheck against multiple dictionaries?
Date: Thu, 19 Mar 2009 02:30:35 -0700 (PDT)
User-agent: G2/1.0

----  martin:

>> I've downloaded speck.el file, but I'm not sure how do I use it.

>> I've created a test file containing mixed correct and incorrect
>> words, in Russian and English:

>> Test тест correct очепятка incorect верно

>> Then I've done M-x speck-mode. Emacs said that Speck-mode has been
>> activated and is using ru_RU dictionary, but nothing has changed in
>> the test buffer. From your description I was expecting that the
>> incorrect words would be highlighted somehow. Am I missing
>> something?

I do not know about speck-mode, but at least ispell.el would pick up
only what looks like a word in the currently enabled language; only
such words are recoded according to the current ispell dictionary
requirements and passed to the ispell process.

This means that "Test" is skipped in the Russian mode (just like
=%==!!.... etc); and conversely, очепятка and верно are skipped in a
Latin-alphabet context.  And this is really convenient.  (While the
users of Latin-alphabet languages should stumble at any foreign word.)

> I don't have a Russian spell-checking engine installed so I can't
> comment your example directly.  Suppose I have a file with the line

> Test Test correct Duckfehler incorect richtig

> Doing M-x speck-mode here starts an Aspell process checking with my
> default language which is English, flagging the last three words as
> incorrect.  I can now set the region around the word "Duckfehler"
> and type C-2 C-? to set the speck language text property of that
> word to German, which will still flag the word as incorrect but now
> with the appropriate German suggestions how to correct it.

There are some formal text (like html or xml) which allow for a
language markup.  Something like
| correct <i lang="de">Duckfehler</i> incorect  <i lang="de">richtig</

>> I think that the ispell-ish behavior would indeed be nice. I've
>> looked through the ispell code, and it looks like Emacs raises some
>> kind of exception if the ispell process returns "invalid"
>> status. Do you think it is possible to fallback to another
>> dictionary on such an event?

> With my Aspell engine I can write (and bind) a trivial command like

> (defun ispell-check-word (arg)
>    (interactive "p")
>    (if (= arg 2)
>        (ispell-change-dictionary "de_DE")
>      (ispell-change-dictionary "en_US"))
>    (ispell-word))

> here and probably get what you want.  Note, however, that each time you
> change the language with this command, Emacs kills an old and spawns a
> new process of the Aspell engine.

Yes, because everything has to be changed: the filtering rules, the
affix grammar, the word provision.

> Changing `ispell-word' as you say seems hardly possible because in
> general there's no way to distinguish a word written incorrectly in
> language A from a word written correctly in language B.  For the
> special English/Russian case you could probably investigate the
> character properties at `point' and spark the appropriate
> word-checking process.

In principle one could create a combined grammar for Russian and
English; actually it would be a "direct sum" of the two grammars,
as the word spaces are completely disjoint because the alphabets are
disjoint.  Such a combined processor exists in TeX for a combined
English-Russian hyphenation.  It would be more efficient too, because
there would be no need to spawn a new process at every change from
Russian to English.

But presently it would be easier to use a two-pass approach:

1. check the Russian spelling (ignoring all Latin characters);
2. check the English spelling (ignoring all Cyrillic characters)

Both passes are faster then in a switching mode -- and no extra work
is required.  Besides, you could spellcheck the Russian+French or
Russian+German combinations (but not Russian+French+English, of
course; while Russian+German+Armenian is still possible).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]