aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[aspell-devel] Two questions about run-together words, and affixes


From: Gora Mohanty
Subject: [aspell-devel] Two questions about run-together words, and affixes
Date: Mon, 11 Dec 2006 15:11:25 +0530

Hello,
  I am in the middle of preparing a write-up on using aspell for
Hindi, and, eventually other Indian languages. The long and short
of it is that the original problems that I was facing were because
of a misunderstanding on my part of the format for the soundslike
file.
  As measured by the performance on a test list of some 500 words,
it now works reasonably well with the plain Hindi dictionary, i.e.,
without any support for advanced aspell features. Adding soundslike
support makes the performance comparable to the best modes for
English. This is not surprising, as Indian languages are spelt
phonetically.
  I still have two questions about issues that would improve
performance:
(a) Run-together words: It seems that for a long mispelled word that
    is close to two smaller words, aspell first suggests combinations
    of shorter words. For example, in English, "ratdog" turns up
    "rat dog" and "rat-dog" as the first two suggestions. I had
    thought that this was because of run-together words, but using
    "run-together false" in the .dat file does not seem to make a
    difference. I understand why one would want to have run-together
    words in the suggestion, but is there any way I could eliminate
    them (for example, one does not hyphenate words in Hindi), or
    use a weighting to reduce their importance, so that they appear
    later in the list of suggestions.
(b) Affix rules: Though affix rules seem to be working properly for
    Hindi, is there any way that I could have aspell accept, e.g.,
    "word + suffix" as correct, when only "word" is in the dictionary,
    but there is an affix rule for "word + suffix"? Alternatively,
    would it be possible for "word + suffix" to appear as the first
    suggestion in such a case? The reason that this would be useful
    is that Hindi makes a lot of use of suffices, and without these
    being marked correct, an auto-spellchecked document gets cluttered
    with spurious underlinings.

Regards,
Gora





reply via email to

[Prev in Thread] Current Thread [Next in Thread]