[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug 130397

From: Geoff Kuenning
Subject: Re: Bug 130397
Date: 08 Jan 2005 13:31:11 +0100
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3

Ken writes:

> Geoff has a much better understanding of the underlying spell search
> engine.  Perhaps he can shed additional light on this topic.

I just looked at the code to be sure my memory is correct.  Here's the
short rundown: in the '-a' interface, ispell interfaces with the
outside world purely in a byte-indexed mode.  It is perfectly capable
of handling UTF-8 and similar multi-byte encodings, but when it
reports the offsets of incorrect words, it does so as a byte offset,
not a character offset.

Does emacs provide an underlying byte-indexed interface to the buffer?
If so, life should be easy: just have ispell.el use that interface.
If not, I think life is going to be very, very difficult.  It's
possible that I could modify ispell to provide a display-width index
rather than a byte index, but it's not trivial and there may be
pitfalls.  There's also the problem that--even if I get off my butt
and produce a new release reasonably soon--there are lots of old
copies of ispell out there that wouldn't support the new interface.

Juri writes:

> And while on this topic, I want to remind that many Emacs users suffer
> from the inability of ispell.el to simultaneously check mixed multi-language
> texts.  So, whoever fixes ispell.el, please take that into account.
> Such combining is quite easily doable for any disjoint alphabets, as well
> as for alphabets where one alphabet is a superset of another, like e.g.
> English and some other Latin-based alphabets.  Even for overlapping
> alphabets it would be possible with using the `w' syntax to get a word
> and to feed it to different ispell instances for each dictionary.

I'm not entirely sure what you mean here.  For disjoint alphabets,
it's certainly relatively easy to figure out which word should go to
which ispell instance.  For identical, superset, or overlapping
alphabets, the problem is basically insoluable.  For example, "fra" is
a misspelling in English but legal in Italian.  If it appears in a
mixed passage, which dictionary should it be fed to?  The only
solution would seem to be to require the user to mark passages in some
way, as is done in HTML.
    Geoff Kuenning   address@hidden   http://www.cs.hmc.edu/~geoff/

One could not be a successful scientist without realizing that, in contrast to
the popular conception supported by newspapers and mothers of scientists, a
goodly number of scientists are not only narrow-minded and dull, but also just
stupid. -- James Watson

reply via email to

[Prev in Thread] Current Thread [Next in Thread]