emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Changing dictionary while flyspell-buffer is running


From: Titus von der Malsburg
Subject: Re: Changing dictionary while flyspell-buffer is running
Date: Fri, 22 Feb 2019 10:57:29 +0100
User-agent: mu4e 1.1.0; emacs 26.1.91

On 2019-02-22 Fri 08:10, Eli Zaretskii wrote:
>> From: Titus von der Malsburg <address@hidden>
>> Cc: Joost Kremers <address@hidden>, address@hidden, address@hidden
>> Date: Thu, 21 Feb 2019 22:19:53 +0100
>>
>> > It needs at least 30 letters to guess right, which is quite a few.
>>
>> The number of letters depends on the configured languages, it could be
>> less than 30 when the scripts are different but for English, Dutch,
>> and German 30 works well in my experience and languages don’t get much
>> more similar than that (except if you want to distinguish between US
>> English and UK English).
>
> The minimum number also depends on the expected reliability of
> language detection, of course.

Of course.  I should say that I didn’t come up with the algorithm.  It’s
a standard approach to language detection used in many contexts.  Its
selling points are high accuracy, low computational complexity, and that
only a small amount of language data is required.  For most languages,
we need only 1.2Kb of data.

[More below.]

>> I just tried it and noticed one downside: Flyspell offers possible
>> corrections for unknown words and when multiple languages are
>> configured, these suggestions come from all configured dictionaries.
>
> Of course, but what would you expect?

I would expect to get only suggestions from the language that I’m
currently typing in.

> And how is that a downside?

If I have to pick the correct word from a list that contains many
irrelevant words, it will take more time.  The suggestions are just less
relevant on average.

> Hunspell doesn't try to guess the language at all, it just looks in
> all loaded dictionaries one by one.

That’s the problem. :)

>> Many of them are of course not relevant because they are not in the
>> language of the paragraph.
>
> There's no "language of the paragraph" in this method, you can freely
> mix words from different languages in the same paragraph.  There are
> important use cases for that, like editing a message translation
> catalog or text that that explains in-line the meaning of words in
> another language.

If the use case is working with paragraphs that mix languages, the user
is free to use Hunspell.  However, there is the other use case, the one
that I’m interested in, where the document contains whole paragraphs
each in its own language.  Plus the use case where the document is in
just one language and I don’t want the spell-checker to suggest words in
some other language.

Note, that automatic language detection has other applications beyond
changing dictionaries for spell checkers.  For instance, it allows to
automatically switch the voice used by the Festival speech synthesizer,
which is useful for blind people working with text in multiple
languages.  It can also switch the typographical conventions used by
type-mode (e.g., use quote symbols that are appropriate for the current
language).  It could also switch the language of dictionaries, thesauri,
and text completion packages such as company-ngrams.

>> Flyspell also has an autocorrection feature (which I’m not using)
>> and this feature would also largely stop being useful with multiple
>> dictionaries.
>
> It will only become less useful if the first correction is off in a
> significant number of cases.  Which is not at all expected, certainly
> not when each language uses a different script.
>
>> I think that this makes the Hunspell solution less appealing.
>
> I think you are slightly biased ;-).  As am I, most probably.  Both
> solutions have their advantages and disadvantages, and the user should
> choose which one better suits his/her needs in each case.

Exactly, and that’s why I never said that people should be prevented
from using Hunspell with multiple dictionaries if that’s the best
solution for them.  :)

> I mentioned Hunspell because I think few people even know about this
> feature, which is quite unique among spellers supported by Emacs.

That is true.  I certainly didn’t know about that feature.  Hunspell is
fairly impressive, especially for languages like German that can freely
compose new words.  Following this conversation, I might actually switch.

In sum, I don’t want to push my package to anyone.  I said I would be
happy to contribute it to Emacs/Elpa /if/ there is interest.  But I’m
perfectly happy with keeping it in Melpa where it currently lives.

Regarding my initial question: I had a closer look at how flyspell-buffer works 
internally and I’m afraid there is no easy way to make it switch languages 
half-way through the document.  The hook for incorrect words is called only 
when the spell-checker has already finished its work.  It will be necessary to 
write a new function that processes the document paragraph by paragraph.

Thanks for all the suggestions.

  Titus




--
Dr. Titus von der Malsburg
Department of Linguistics
University of Potsdam, Germany
https://tmalsburg.github.io



reply via email to

[Prev in Thread] Current Thread [Next in Thread]