[Aspell-user] Affixes leftover from expanded wordlist dumps

From: Isaac Colley
Subject: [Aspell-user] Affixes leftover from expanded wordlist dumps
Date: Fri, 06 Jun 2008 15:48:54 -0600
User-agent: Thunderbird (X11/20080505)


I am building a dictionary based language detection program using the
dumps of aspell dictionaries.

I need to expand wordlists completely, however some languages, such as
Russian, after expansion will leave behind affixes (I think) after a
'?'.  For example:

aspell dump master ru | aspell -l ru expand

will produce lines like:
умаслит? умаслит?ла умаслит?ли умаслит?ло

'умаслит' appears to be the stem, but what about the characters after
the '?'.  Are they affixes?  If so, how do I fully expand them.  Any
insight on how to correctly expand wordlists for every language would be
greatly appreciated. 

Isaac Colley

