aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[aspell-devel] A few dictionary-related issues


From: Pauliuc George
Subject: [aspell-devel] A few dictionary-related issues
Date: Mon, 17 Mar 2003 16:47:26 -0500

Hello!

I'm the current maintainer of the aspell-ro package.  And I
want to expand it.  Please excuse my English.  And please
excuse my ignorance.  If I ask a trivial question just point
some link where I can get more details.

First, it happened for me these days to have to spell check
a few huge documents.  And I was quite impressed by the
results of aspell.  If aspell knew the misspelled word than
that word is in most cases the first option.  Never seen
such results with other spell checkers.  Probably is because
most spell checkers are made initially for the English
language and Romanian (a latin language) is quite different.

Anyway, here's one issue: in Romanian we use the dash (-)
about the same way English uses the apostrophe (').  For
things like "it is" -> "it's" it is simpler to just make the
program learn the short version as well.  But we also have
times (mostly in the poetic works) when some letters drop
and there are more words connected with the dash.  Now, for
this case I think it would be a good idea if aspell could
recognize not the "composed" word but the composing parts
(and recongnize the short versions).  Is this possible?

Also, when a word is partially reproduced (to mimic the live
speech) we use to put an apostrophe at that point.  Like in
the word "ma'am" in American English.  Now, when the
apostrophe is in the middle it's easy - make aspell learn
the new "word".  But if the apostrophe is at the beginning
or the end of the word things get more complicated.  As if
the aprostrophe is missing means is a misspell.  If it
doesn't than it is the case mentioned above.  Can I make
aspell recognize the aprostrophe?  From what I see it
interprets it as a punctuation sign (if it's on the
beginning or the end of the word) and leaves it outside.

How does aspell can learn the words?  By word root plus
possible prefix and suffix?  Or by learning all word forms
one by one?  I'm no linguist :( So I couldn't compile a
complete list for each word.  But just wandering if I can
just add the possible extensions to the root.  It seems
clearner somehow.

The final issue (even more twisted as the ones above ;-):
because of badly implemented Romanian char support many
documents are made without the diacritics.  So, instead of î
we have i and so on.  This is a very particular case for a
spell checker (I don't know any other language with such an
issue) - to add the diacritics.  That would be easy if there
is only a short list of words.  If I add a complete list of
words things get more complicated.  In this case a word
ending in 'a' might means it has the 'the' article (from
English). And the exact same word, with the exact same
spelling, only that ends in 'ă' not in 'a' means it has the
'a' article (from English).  Both words are correct.  But in
the case of the ASCII text there would be a lot of missed
corrections.  One hack is to have only the word ending in
'ă' in the dictionary and just ignore the cases where there
shouldn't be any change.  Is there any way to do this in a
nicer way?  Or at least to be able to have the full
dictionary for the cases where someone has to check a text
with diacritics.

Also, I didn't find a way to dump the whole word list as I
seem to lost the original word list (I have now only the
compiled' version).

And one bug report (I will add it later today probabil in
the sourceforge bug tracker if there isn't another symilar
report): if I wrongly type an 'm' after a word instead of a
comma ',' than spell check the text, choose replace, type
'word,' instead of 'wordm'... aspell won't leave the
punctuation mark outside and try to spell check that as
well.

Attachment: pgpvHu3lr2MGU.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]