Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position...

From: Kevin Atkinson
Subject: Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position...
Date: Sat, 3 Mar 2007 06:30:59 -0700 (MST)

On Sat, 3 Mar 2007, Martin Swift wrote:

On Sat, Mar 03, 2007 at 04:29:15AM -0700, Kevin Atkinson wrote:
The word list is likely in iso-8859-1 but Aspell expects it in utf-8.

Does this mean that aspell expects the word lists to have the same
charset as the machine? Isn't that a little odd?

I don't understand the question.

de.dat sets 'charset' as iso-8859-1:

 # cat de.dat
 # Generated with Aspell Dicts "proc" script version 0.50.1
 name de
 charset iso-8859-1
 soundslike de
 affix      de

Does aspell not use this to determine the charset? If not, /shouldn't/

Yes it should.

I just tried

 /usr/bin/prezip-bin -d < de-common.cwl | /usr/bin/aspell --lang=de create 
--encoding=iso8859-1 master ./de-common.rws

Something is wrong. The "--encoding=iso8859-1" should not be necessary. It should be using the value from "charset" in "de.dat". Try setting your locale to "C" and see if it makes a difference.

A couple of questions:

 Is this going to conflict with my machines character encoding, or
has aspell created an rws file for a utf-8 system?

No.  Aspell will convert between encoding as necessary.

 Is the machine character encoding check a feature? It really seems
that since one might attemp to install the same wordlist on machines
with different character encodings that this is prone to failure.


