[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file

From: Agustin Martin
Subject: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Fri, 7 Jan 2011 14:14:03 +0100

2011/1/4 Reuben Thomas <address@hidden>:
> With the following text, and using emacs -Q, I get the errors you can
> see in the messages log below when using hunspell to spell-check a UTF-8
> buffer with some extended characters in it.
> I did test this with emacs -Q, but the current session, in which I
> reproduced the problem and am now composing this bug report, was not
> started with -Q (this is so submitting the bug report works properly!).
> I am running a freshly bzr-pulled build of the emacs-23 branch.

Hi, Reuben,

I can also reproduce this with emacs23.2. I could locate problems in
two lines, after splititng original lines,

-- Cut here -- 8< ----- minimal.txt: utf-8
of out-of-copyright works. The Kindle may be a loss leader, but at £109
it’s still not cheap. Feedbooks, rather than integrating easily into
-- Cut here -- 8< ----- End of minimal.txt

In first line, currency seems to give some conversion errors when
iso-8859-1 is used, when that should have ignored by hunspell. I get
tons of

UTF-8 encoding error. Missing continuation byte in 0. character position:

for that line when using

$ cat minimal.txt | hunspell -d en_US -a -i iso-8859-1

In second line unusual apostrophe seems to cause some confusion to
hunspell when utf8 is used. Comparing what aspell and hunspell give in
similar text I get

$ cat minimal.txt | aspell --encoding=utf-8 -d en_US -a
& Feedbooks 6 22: Feed books, Feed-books, Feedback's, Feedbags, ...

$ cat minimal.txt | hunspell -d en_US -i utf-8 -a
& Feedbooks 8 24: Feed books, Feed-books, Feedback, Feedbags, ...

Do not worry about first number, is the number of suggestions. However
position in second number differ. Seems that hunspell is not
considering that apostrophe as a single (multibyte) char when
counting, but as three components

Looks to me an hunspell bug. I found no reference to this problem in
hunspell sf site, but noticed that Hunspell 1.2.14 was released
yesterday. Need to check if that has some related new.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]