[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Aspell-user] Hyphens and apostrophes in words
From: |
Carlo Traverso |
Subject: |
Re: [Aspell-user] Hyphens and apostrophes in words |
Date: |
Sun, 19 May 2013 12:19:14 +0200 (CEST) |
>>>>> "ciaran" == =?iso-8859-1?B?Q2lhcuFuINMgRHVpYmjtbg==?= <iso-8859-1>
>>>>> writes:
ciaran> I'd like to know which, if any, spellcheckers can be
ciaran> configured to act like this. (The examples are from
ciaran> English but the real need comes from other languages.)
ciaran> Asking here about aspell particularly, of course.
ciaran> First, if necessary, allow the dictionary to contain words
ciaran> with apostrophe "'" and hyphen "-" in any position. (I am
ciaran> aware of the side-effects of this and am not worried by
ciaran> them.)
ciaran> Now, when checking text:
ciaran> 1. Accept a word containing a hyphen if EITHER the
ciaran> dictionary contains the whole word including the hyphen
ciaran> ("hotch-potch") OR if the dictionary contains both parts
ciaran> separately ("half-moon").
ciaran> 2. With a dictionary containing "'twas" but not "twas",
ciaran> accept "'twas".
ciaran> 3. With a dictionary containing "well" but not "'well",
ciaran> not accept "'well".
aspell can do 2 and 3, (but you have to recompile the English
dictionary after changing the handling of ' in the .dat file; and of
course add the acceptable words; this is the aspell way to do your
"First" point).
For 1, you should modify the .dat file again allowing - in the middle
of a word, add the composed words, and pass the spell-checker twice,
once with the modified dictionary, (to accept the words with -) once
with the original one (or rather the one modified in the first step)
to accept the two components. The first pass will refuse the words
with - not included, the second pass will split their components and
check again.
I don't think that it is possible to do it with one pass, combining
the two dictionaries in one .multi file since the .dat have to be
different (and hence the word tokens will be different).
Carlo Traverso