[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [aspell-devel] Big wordlist and affix lexicons
From: |
Børre Gaup |
Subject: |
Re: [aspell-devel] Big wordlist and affix lexicons |
Date: |
Mon, 27 Nov 2006 14:52:23 +0100 |
User-agent: |
KMail/1.9.5 |
Láv, skábmamánu 25. b. 2006 13.52, Kevin Atkinson čálii:
> On Sat, 25 Nov 2006, Børre Gaup wrote:
> > The problem is that hunspell is not as ubiquitous as aspell. As far as I
> > have seen hunspell is not commonly used, but aspell is used both in Linux
> > and in Mac OS X (through Cocoaspell). Hunspell is _intended_ to replace
> > myspell in openoffice.org (according to it's homepage).
> >
> > What features in hunspell would you specifically like to have in aspell?
>
> Possibly:
>
> - Max. 65535 affix classes and twofold affix stripping
>
I had a brief look at hunspell documentation of their dictionaries and affix
files. As far as I understand twofold affix stripping means that you
can "stack" two different affixes after one another, or in other words, make
one affix point to another, just the way one word points to an affix in an
aspell dictionary.
En example from sami, the verb muitalit (to tell)
in present tense it has for example these forms:
muital -an
-at
-a
-edne
-eahppi
-eaba
-ehpet
Behind each of these forms it is legal to add the
clitcs: -ge, -ba, -bat, -go, -son, -han, and a few more.
So in current aspell we would have to have both the -an and -an+clitics form
in the affix file, but if it had twofold affix stripping we could just point
to verb suffixes to point to the clitics, is that correct?
We also have verbs where the stem changes. Diehtit (to know) is an example
(same tense and form as above):
dieđ -án
-át
dieht -á
diht -e
dieht -ibeahtti
-iba
-ibehtet
Is there a way to tell that the three forms of stems are in fact the same word
to aspell so that we can handle them as one form, instead of three? Or would
some of the features mentioned below be of any help for this phenomen?
> - Handling conditional affixes, circumfixes, fogemorphemes, forbidden
> words, pseudoroots and homonyms.
>
> - Support complex compoundings
>
> I believe some of these will benefit you.
>
> However I only want to implement them if these is a clear benefit to it.
> For example based on what several people have told be complex compounding
> rules are not worth it.
>
> Aspell is far more complex then Myspell and each feature needs to
> implemented carefully so that it will behave sensibly with the
> suggestion code. Also it is important that the addition of the
> feature won't degrade performance, Especially when the feature isn't used.
Perhaps some of these features could be plugins, where different languages
load different plugins, according to their needs?
regards,
--
Børre Gaup