[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ispell and unibyte characters

From: Agustin Martin
Subject: Re: Ispell and unibyte characters
Date: Fri, 13 Apr 2012 18:38:23 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Fri, Apr 13, 2012 at 06:53:57PM +0300, Eli Zaretskii wrote:
> > Date: Fri, 13 Apr 2012 17:25:25 +0200
> > From: Agustin Martin <address@hidden>
> > 
> > > I don't understand what are you trying to accomplish by encoding
> > > OTHERCHARS in UTF-8.  What exactly is the problem with them being
> > > encoded in some 8-bit encoding?  Please explain.
> > 
> > Imagine a fake entry in the general list, either in ispell.el or provided
> > through `ispell-base-dicts-override-alist' (no accented chars for 
> > simplicity)
> > 
> > ("catala8"
> >      "[A-Za-z]" "[^A-Za-z]" "['\267-]" nil ("-B" "-d" "catalan") nil 
> > iso-8859-1)
> > 
> > Unless emacs knows the encoding for \267 (middledot "ยท") it cannot decode it
> > properly. I prefer to not use UTF-8 here, because I want the entry to also 
> > be
> > useful for ispell (and also be XEmacs incompatible). The best approach here
> > seems to decode the otherchars regexp according to provided coding-system.
> > 
> > I have noticed that there seems to be no need to encode the resulting string
> > in UTF-8, Emacs will know what to do with the decoded string.
> > 
> > I tested something like
> > 
> >  (dolist (adict ispell-dictionary-alist)
> >         (add-to-list 'tmp-dicts-alist
> >                      (list
> >                       (nth 0 adict)  ; dict name
> >                               "[[:alpha:]]"  ; casechars
> >                               "[^[:alpha:]]" ; not-casechars
> >                       (if ispell-encoding8-command
> >                           ;; Decode 8bit otherchars if needed
> >                           (decode-coding-string (nth 3 adict) (nth 7 adict))
> >                         (nth 3 adict)) ; otherchars
> >                               (nth 4 adict)  ; many-otherchars-p
> >                       (nth 5 adict)  ; ispell-args
> >                       (nth 6 adict)  ; extended-character-mode
> >                       (if ispell-encoding8-command
> >                           'utf-8
> >                         (nth 7 adict)))))
> > 
> > and seems to work well.
> So you are taking the Catalan dictionary spec written for Ispell and
> convert it to a spec that could be used to support more characters by
> using UTF-8, is that right?  If so, I find this a bit kludgey.  

I think differently and like above approach because I find it way more
versatile for general definitions. This is not a matter of ispell blind
reuse. In particular I noticed this problem in Debian with the catalan spec
written for aspell (automatically created after info provided by aspell-ca
package).  That info is written that way to also be useful for XEmacs, but
with above post-processing it can work way better for Emacs.

> How
> about having a completely separate spec instead?  More generally, why
> not separate ispell-dictionary-alist into 2 alists, one to be used
> with Ispell, the other to be used with aspell and hunspell?  I think
> this would be cleaner, don't you agree?

As a matter of fact that is what we do in Debian from info provided by
ispell, aspell and hunspell dicts maintainers. The difference is that the
provided info is supposed to be valid for both Emacs and XEmacs, so
I find post-processing as above very useful, because it helps to take the
best for Emacs. Global dicts alist is built from

(dolist (dict (append found-dicts-alist

where first found wins. `found-dicts-alist' has the result of automatic
search (currently used only for aspell) and has higher priority, 
`ispell-dictionary-base-alist' is the fallback alist having the lower
priority. Depending on the spellchecker 
`ispell-base-dicts-override-alist' is set to an alist corresponding to
ispell, aspell or hunspell dictionaries (they are handled independently)

I do not think that maintaining separate hardcoded dict lists in ispell.el
for ispell, aspell and hunspell worths.

For hunspell, in the future I'd go for some sort of parsing mechanism like
current one for aspell.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]