bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file

From: Reuben Thomas
Subject: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Mon, 03 Jan 2011 23:14:41 +0000

With the following text, and using emacs -Q, I get the errors you can
see in the messages log below when using hunspell to spell-check a UTF-8
buffer with some extended characters in it.

I did test this with emacs -Q, but the current session, in which I
reproduced the problem and am now composing this bug report, was not
started with -Q (this is so submitting the bug report works properly!).

I am running a freshly bzr-pulled build of the emacs-23 branch.

Text follows

----cut here----
title: Kindle 3 is a good first attempt
tags: computing, books
format: markdown
date: Mon, 03 Jan 2011 20:53:13 +0000
post-id: 2585181001

Giving my girlfriend a Kindle for Christmas was the carrot in a multi-pronged 
strategy to avoid needing more bookshelves (the stick being “I will start 
giving away your books” and my contribution being to archive books I’ve read 
(or return the many that aren’t even mine). This therefore required that I 
stocked it with books before she got her hands on it, which in turn was all the 
excuse I needed to play with the thing.

My lazy solution was simply to download all of 
[Feedbooks](http://www.feedbooks.com); I [wrote some 
scripts](http://rrt.sc3d.org/Software/Kindle/) to make this actually lazy, 
rather than brain-numbingly dull. In the process I found that while the Kindle 
is nice to hold and great to read, it struggles to cope with a large collection 
of books (even though the nearly 3,000 volumes of Feedbooks only half-filled 
its 4Gb memory), and is woeful as a research tool. And, of course, Amazon’s 
first-mover-evil surfaced early.

Here are the problems I had:

1. Amazon’s own store doesn’t seem to contain free books. I think it’s poor 
form not to give people a straightforward choice of free editions of 
out-of-copyright works. The Kindle may be a loss leader, but at £109 it’s still 
not cheap. Feedbooks, rather than integrating easily into the Kindle, like, 
say, a 3rd-party software provider into Ubuntu’s Software Center, provide a 
catalogue which itself is in the form of a book, doesn’t automatically update, 
and offers a list ordered only by title. In other words, it’s useless; one is 
better off using the built-in web browser to search the online catalogue…

2. …or better, another browser, since the Kindle’s is woefully slow (and I 
don’t just mean the screen update). It’s just about usable, and hence useful in 
an emergency, but is no good as, for example, an online research tool to use in 
parallel with the books you have downloaded, although…

3. …offline search is awful too. With just the few ebooks that come loaded on 
the device, it was slow; with the thousands of books I loaded, it simply locked 
up the device, even when trying to search in the manual, presumably already 
indexed. The Kindle seems to index its contents in the background, but even 
now, over a week later, search doesn’t work. The only effective navigation is 
by a book’s table of contents, and, to choose which books to read, the 
user-definable collections, though…

4. …collections are a pain to set up for many books, as you have to select each 
book manually; there is no way I have found to select a range. (Fortunately, I 
was able to define collections programmatically, but this will be beyond most 

In summary, it’s a lovely device, but the software is rather toytown. Amazon 
could improve it (and indeed, the 3.0.3 firmware update, at the experimental 
stage when I checked, claims, vaguely, “performance improvements”), but given 
that their main interest is in selling books and Kindles, I’m not hopeful that 
it will happen before the next hardware iteration; whether it happens at all 
depends on competition, and there should be plenty of that, to go by the number 
of other ebook readers.

----cut here----

In GNU Emacs (i686-pc-linux-gnu, GTK+ Version 2.22.0)
 of 2011-01-03 on mord
Windowing system distributor `The X.Org Foundation', version 11.0.10900000
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_GB.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Text

Minor modes in effect:
  longlines-mode: t
  buffer-face-mode: t
  flyspell-mode: t
  show-paren-mode: t
  savehist-mode: t
  minibuffer-electric-default-mode: t
  iswitchb-mode: t
  icomplete-mode: t
  global-auto-revert-mode: t
  desktop-save-mode: t
  smart-quotes-mode: t
  mouse-wheel-mode: t
  use-hard-newlines: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
M-x r e p o r t - e m <tab> <return> h u n s p e l 
l SPC <M-backspace> i s p e l l SPC w i t h SPC h u 
n s l e <backspace> <backspace> s p e <backspace> <backspace> 
p e <backspace> <backspace> <backspace> p e l l SPC 
f a i l s C-g <down> <down> <down> <down> <down> <down> 
<down> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> M-x i s p e l l 
<return> SPC SPC SPC M-x i s p e <backspace> <backspace> 
<backspace> <backspace> <up> <up> <return>

Recent messages:
Scanning for "hard" Perl constructions... done
Applying style hooks... done
Scanning for "hard" Perl constructions... done
Scanning for "hard" Perl constructions... done
Scanning for "hard" Perl constructions... done
Scanning for "hard" Perl constructions... done
Lazy desktop load complete
Spell-checking Kindle 3 is a good first attempt using hunspell with 
british+accs dictionary...
Spell-checking region using hunspell with british+accs dictionary...done
ispell-process-line: Ispell misalignment: word `Feedbooks' point 1363; probably 
incompatible versions

Load-path shadows:
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-style hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-buf hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/context hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/bib-cite hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-fold hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-jp hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/context-nl hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/toolbar-x hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-mik hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/context-en hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/texmathp hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-info hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-fptex hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-font hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/latex hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/font-latex hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-bar hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/multi-prompt hides 
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex hides 

(shadow sort mail-extr message sendmail ecomplete rfc822 mml mml-sec
password-cache mm-decode mm-bodies mm-encode mailcap mail-parse rfc2231
rfc2047 rfc2045 qp ietf-drums mailabbrev nnheader gnus-util netrc
time-date mm-util mail-prsvr gmm-utils wid-edit mailheader canlock sha1
hex-util hashcash mail-utils emacsbug preview prv-emacs byte-opt
warnings tex-buf noutline outline font-latex bytecomp byte-compile latex
tex-style tex nxml-uchnm rng-xsd xsd-regexp rng-cmpct rng-nxml rng-valid
rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util rng-pttrn
nxml-ns nxml-mode nxml-outln nxml-rap nxml-util nxml-glyph nxml-enc
xmltok sgml-mode conf-mode newcomment make-mode vc-git cperl-mode
longlines face-remap filladapt flyspell auto-dictionary-autoloads
dictionary-autoloads js2-mode-autoloads package reporter completing-help
ff-paths uniquify paren savehist minibuf-eldef iswitchb icomplete
autorevert time cus-start cus-load desktop server change-mode advice
help-fns advice-preload php-mode derived etags cc-langs cl cl-19 cc-mode
cc-fonts cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs
speedbar sb-image ezimage dframe easymenu assoc lua-mode regexp-opt
comint ring whitespace etags-update smart-quotes edmacro kmacro ispell
ffap muse-autoloads emacs-goodies-el emacs-goodies-custom
emacs-goodies-loaddefs easy-mmode devhelp preview-latex tex-site
auto-loads tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win
x-dnd font-setting tool-bar dnd fontset image fringe lisp-mode register
page menu-bar rfn-eshadow timer select scroll-bar mldrag mouse jit-lock
font-lock syntax facemenu font-core frame cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew
greek romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev loaddefs button
minibuffer faces cus-face files text-properties overlay md5 base64
format env code-pages mule custom widget hashtable-print-readable
backquote make-network-process dbusbind system-font-setting
font-render-setting gtk x-toolkit x multi-tty emacs)


