aspell-announce
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Aspell Status Update as of February 12, 2004


From: Kevin Atkinson
Subject: Aspell Status Update as of February 12, 2004
Date: Fri, 13 Feb 2004 02:43:46 -0500 (EST)

I have recently been putting a lot of work towards Aspell 0.51.  I have
made a good deal of changes.  Most of the items in my initial To Do list
are done.  Pre-release snapshots are now available off of Aspell home page
http://aspell.net/.

The biggest change in Aspell 0.51 is support for Affix Compression.  
Affix compression is the act of combining several words with a common base
word into one word which consists of the base word and a list of affixes
to apply.  (Affix is the generic term for prefix, suffix or infix).  For
example "alarm alarms alarmed alarming" will become "alarm/SDG" where SDG
stands for the suffixes of alarm.  This can make a huge difference in
space for languages with have extensive affixation such as German.

Ever since I started Aspell this has been one of the most requested
features, as without it dictionaries in some languages are huge.  A while
ago, For some reason Red Hat decided to abandon Ispell in favor of Aspell,
a move that even I found questionable.  The package maintainer complained
loudly about the size of the dictionaries, and the fact that the
dictionaries will take up half a CD, but did not submit any code.  
Latter, when Aspell was being considered for OpenOffice the issue of Affix
Compression came up again for the same reason (size), but this time,
something was done about it.  Kevin Hendricks studied the Ispell code and
reimplemented it in simple C++, with the intent of it eventually getting
integrated into Aspell.  During that process he also wrote a simple spell
checker library called MySpell which ended up being used in OpenOffice.  
He left it to me to integrate into Aspell.  Now, nearly two years latter,
his code has finally been integrated in Aspell.

However, unlike Ispell, not every word list should be affix compressed.  
That is because with Affix Compression soundslike lookup can, currently,
not be used.  Aspell soundslike lookup is what makes Aspell such a
powerful spell checker when coming up with suggestions, and with out it
suggestion quality will suffer for non-phonetic languages such as English.  
Nevertheless, even with Affix Compression, Aspell does a better job than
Ispell since it is able to look for words in which two mistakes were made
instead of one (ie sychlogist for psychologist), is able to find a
correctly accented form of a word when accents are left off, and is able
to use MySpell's replacement tables which allow a group of letters such as
replacing "uff" with "ough".

In fact, with Affix Compression finally implemented the only thing Ispell
has that Aspell does not is a nroff filter modem which should
change by the time Aspell 0.51 is released.  It is my sincere hope that
when Aspell 0.51 is released Aspell can become the de facto spell checker
for all free operating systems and possible all Unix like systems.  
However, I do not expect this to happen overnight as Ispell is a very old
program which can be found on most all Unix like systems and has in fact
become the de facto spell checker of the Unix world.

I also hope that major programs, such as OpenOffice, Mozilla, start 
offering the option to use Aspell if it is installed on the system rather 
than only relaying on their builtin spell checker.  I hope to improve the 
interface of Aspell and the inner workings to make Aspell more suitable 
for the task.

Affix compression is not the only change in Aspell 0.51.  Other changes 
include:

   * Added support for loadable filters thanks to Christoph Hintermüller

   * Enhanced TEX filter to support recognizing accent commands, such as
     the German umlaute, and to treat words with hyphenation characters
     in them as one word, also thanks to Christoph Hintermüller

   * Added gettext support thanks to Sergey Poznyakoff

   * Reworked how the dictionary is stored to take up less space (around
     80% for the English language) and be faster in some cases.

   * Reworked the build system so that a single Makefile is used for
     most of the code.

   * Added support for MySpell Replacement Tables for better suggestions
     when phonet information is not available.

   * Manual has has been converted to texinfo format thanks to the work
     of Chris Martin.

Additional things which will be done by Aspell 0.51 include:

   * Make Aspell Thread safe

   * Enhance ispell.el so that it will work better with the new
     Aspell.

However, there is still more to be done, and I can certainly use some
additional help.  In particular the following items need to be done before
I consider Aspell finished. If you are interested in helping me with one
of these tasks please email me. Good C++ skills are needed for most of
these tasks involving coding. I hope to have these all done by Aspell
0.51.

   * Clean up copyright notices and bring the Aspell package up to *GNU
     Standards*.

   * Allow Aspell to *check documents which are in UTF-8*. I don't know
     the proper way to use Unicode characters with the curses library,
     and I can't seam to find any concrete documentation on how to do
     it. If you have experience in this area I would really appreciate
     it if you could enlighten me.

   * Come up with a *nroff mode* for spell checking. I know nothing
     about nroff. I would gladly write the filter if someone would be
     willing to work with me in developing one. All I really need to
     know is what to skip.

And the following things I would like to get done.  However, I may still
consider Aspell finished with out. They will probably eventually get
implemented. However, I could still use help with them.

   * Use Lawrence Philips' new *Double Metaphone algorithm*. See
     `http://aspell.sourceforge.net/metaphone/'. The main task involved
     here is converting the algorithm into table form. This will take
     some time but their is no real programming experience is required.
     If you want to help with Aspell but don't have any real programming
     experience, this would be a great place to start.

   * Create a *C++ interface* for Aspell, possibly on top of the C one.

   * Write a *GUI* for the aspell utility. Ideally it should be able to
     do everything the Aspell utility can do and not just be able spell
     check a document.

   * Better *support for compound words*. If you speak a language which
     has a lot of compound of run-together words I would appreciate
     hearing back from you. The current support for _conditional_
     compound words will disappear in Aspell 0.51 since no one seams to
     be using it. Support for _unconditional_ compound words will still
     be available. However, several people have informed me that they
     need more. I attempted to provide that, but it wasn't powerful
     enough, and hence unused. Thus, I am going to start from scratch,
     but I need to know exactly what is involved in correct compound
     formation.

   * Rank suggestions based on *frequency information*.  Both global
     frequency and document specific frequency can be used.  The latter
     will require that the whole document be made available to the
     spell checker.  Also use frequency information to flag words which
     are found in the dictionary but not in common usage, and thus
     might not be what was intended.

   * Be able to accept *words with spaces in them* as many languages
     have words, such as a word in a foreign phrase, which only make
     sense when followed by other words.

   * Support *soundslike lookup with affix compression*.  I think it is
     possible, although I don't know how effective it will be.  The
     basic idea is to affix compress the soundslike codes and then
     match the codes up with affix compressed words.  If you are
     interested, email <address@hidden>, and I will explain it in
     more detail.

Finally, I would really appreciate it if people could start trying out the
pre-release snapshots.  I would especially appreciate it if dictionary
authors could try out the affix compression in Aspell and report back any
problems.  The latest snapshot is
http://aspell.net/devel/aspell-0.51-20040212.tar.gz.  Future versions will
be uploaded to http://aspell.net/devel/.


On a different note, Aspell 0.50.5 was just released.  It contains mostly
bug fixes with a few minor enhancements.  You should eventually be able to
get it at ftp://ftp.gnu.org/gnu/aspell but it is not their yet.  In the
meanwhile you can get it at http://aspell.net/aspell-0.50.5.tar.gz. 

---
http://kevin.atkinson.dhs.org









reply via email to

[Prev in Thread] Current Thread [Next in Thread]