[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [aspell] Affix or something else
From: |
Trond Eivind Glomsrød |
Subject: |
Re: [aspell] Affix or something else |
Date: |
31 Jan 2001 11:07:30 -0500 |
User-agent: |
Gnus/5.0808 (Gnus v5.8.8) Emacs/21.0.96 |
Kevin Atkinson <address@hidden> writes:
> Subject: [aspell] The Deal on Affix Compression
> Date: Fri, 12 Mar 1999 19:39:29 -0500
>
> I realize that affix compression is important for languages with a
> lot of affix compression however it is not vital. The reason is that
> without affix compression all you have to do is list all all of the
> possible combinations. I release that this wastes space however it
> is doable.
>
> For example the word list that comes with Aspell has
> 70,598 words
> After running it through the munchlist script it has
> 30,953 words
> Which leads to a ratio of
> 2.3
>
> Now a polish word lists has the numbers.
> 1,041,430
> 146,626
> 7.1
>
> Which means that the polish language affix compression saves about 3.1
> times more space than it would for the English dictionary. Not that
> big of a deal.
You're downplaying the significance of it:
address@hidden i386]# ls -l /usr/lib/aspell/english*
-rw-r--r-- 1 root root 2424832 nov 30 19:09
/usr/lib/aspell/english-lrg-only
-rw-r--r-- 1 root root 2355200 nov 30 19:09
/usr/lib/aspell/english-med-only
lrwxrwxrwx 1 root root 14 jan 26 11:39
/usr/lib/aspell/english.multi -> american.multi
address@hidden i386]# ls -l /usr/lib/aspell/polish
-rw-r--r-- 1 root root 35622912 aug 20 11:52 /usr/lib/aspell/polish
address@hidden i386]# ls -l /usr/lib/aspell/czech
-rw-r--r-- 1 root root 64434176 aug 20 11:26 /usr/lib/aspell/czech
address@hidden i386]#
This size has made us leave multiple languages out, FTTB - Polish,
Czech, Esperanto.
Here are the compressed sizes:
address@hidden i386]# ls -l aspell-*
-rw-r--r-- 3 root root 3271749 aug 30 18:13 aspell-0.32.5-1.i386.rpm
-rw-r--r-- 92 root root 3597773 aug 30 18:13 aspell-ca-0.1-6.i386.rpm
-rw-r--r-- 3 root root 30218438 aug 30 18:13 aspell-cs-0.2-3.i386.rpm
-rw-r--r-- 92 root root 5911215 aug 30 18:13 aspell-da-0.2-3.i386.rpm
-rw-r--r-- 92 root root 4535531 aug 30 18:13
aspell-de-0.1.1-7.i386.rpm
-rw-r--r-- 3 root root 600163 aug 30 18:13
aspell-devel-0.32.5-1.i386.rpm
-rw-r--r-- 3 root root 52426 aug 30 18:13
aspell-en-ca-0.32.5-1.i386.rpm
-rw-r--r-- 3 root root 52486 aug 30 18:13
aspell-en-gb-0.32.5-1.i386.rpm
-rw-r--r-- 3 root root 10192261 aug 30 18:14 aspell-eo-0.1-6.i386.rpm
-rw-r--r-- 3 root root 7324582 aug 30 18:14 aspell-es-0.1-8.i386.rpm
-rw-r--r-- 92 root root 3059498 aug 30 18:14 aspell-fr-0.3-6.i386.rpm
-rw-r--r-- 92 root root 728573 aug 30 18:14 aspell-it-0.1-6.i386.rpm
-rw-r--r-- 92 root root 3346648 aug 30 18:14 aspell-nl-0.1-6.i386.rpm
-rw-r--r-- 92 root root 5261253 aug 30 18:14 aspell-no-0.1-8.i386.rpm
-rw-r--r-- 3 root root 16603359 aug 30 18:14 aspell-pl-0.1-6.i386.rpm
-rw-r--r-- 92 root root 1903164 aug 30 18:14 aspell-sv-0.1-8.i386.rpm
address@hidden i386]#
--
Trond Eivind Glomsrød
Red Hat, Inc.