aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

The Deal on Affix Compression


From: Kevin Atkinson
Subject: The Deal on Affix Compression
Date: Fri, 12 Mar 1999 19:39:29 -0500

I realize that affix compression is important for languages with a
lot of affix compression however it is not vital.  The reason is that
without affix compression all you have to do is list all all of the
possible combinations. I release that this wastes space however it
is doable.

For example the word list that comes with Aspell has
  70,598 words
After running it through the munchlist script it has 
  30,953 words
Which leads to a ratio of
  2.3

Now a polish word lists has the numbers.
 1,041,430
   146,626
  7.1

Which means that the polish language affix compression saves about 3.1
times more space than it would for the English dictionary.  Not that
big of a deal.

Also, notice that this dictionary is mighty large.  Especially
considering that the largest English word list I have has these numbers.

  120,361
   73,358
  1.6

So my question is do you really need that large of a dictionary?  The
original poster sending me these figures agrees that a lot of those
146,626 base words are not needed.  So, lets say that we reduce the
base list to down to 35,000, than the numbers are...

  248,000
   35,000
   7.1

Which has about 3.5 times more words than Aspell English
dictionary. True this is large and is slightly wasteful however it is
certainly manageable.

So once again affix compression WILL save space however the expanded
word lists are manageable with out affix compression.  Nevertheless, I
do plan on implementing affix compression.  It is just being put on
hold until the rest of the international code is done.  

If you really care about affix compression than implement it your
self.  But first talk to me so I can make sure you are doing it in a
manner that is consistent with the rest of the aspell library.


-- 
Kevin Atkinson
address@hidden
http://metalab.unc.edu/kevina/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]