[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Aspell-user] Affixes leftover from expanded wordlist dumps
From: |
Isaac Colley |
Subject: |
[Aspell-user] Affixes leftover from expanded wordlist dumps |
Date: |
Fri, 06 Jun 2008 15:48:54 -0600 |
User-agent: |
Thunderbird 1.5.0.14ubu (X11/20080505) |
Hello,
I am building a dictionary based language detection program using the
dumps of aspell dictionaries.
I need to expand wordlists completely, however some languages, such as
Russian, after expansion will leave behind affixes (I think) after a
'?'. For example:
aspell dump master ru | aspell -l ru expand
will produce lines like:
умаслит? умаслит?ла умаслит?ли умаслит?ло
'умаслит' appears to be the stem, but what about the characters after
the '?'. Are they affixes? If so, how do I fully expand them. Any
insight on how to correctly expand wordlists for every language would be
greatly appreciated.
Thanks,
Isaac Colley
- [Aspell-user] Affixes leftover from expanded wordlist dumps,
Isaac Colley <=