[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Aspell-user] Hyphens and apostrophes in words
From: |
Ciarán Ó Duibhín |
Subject: |
Re: [Aspell-user] Hyphens and apostrophes in words |
Date: |
Fri, 24 May 2013 18:00:17 +0100 |
Kevin Atkinson said:
To spellchecker with these special words you need to keep Aspell from trying
to tokenize a string into words (i.e, "hello world!" gets split into "hello"
and "world"). I think you need to use the C API for this.
Thanks. I understand now that my questions are to do with the tokenization,
rather with the spellchecking itself.
In applications like the Aspell demo for Windows in Delphi, aspell can leave
the tokenization to the calling application, which can pass the words, one at a
time, to aspell_speller_check - is that correct? So if I was writing my own
application to use aspell, I would not have a problem - I would just do the
tokenization myself.
But I see that there are procedures in aspell (in the CPI?), for instance,
aspell_document_checker_next_misspelling, which seems to accept a LINE of text
and tokenize it before testing the words. I suppose this may be how the
"aspell" command-line program does its tokenization, and probably also
applications like UltraEdit (which I use a lot) will avail themselves of it.
I'm not a C programmer, but if I knew where to look in the aspell source, I
could try and see how difficult it would be to modify the tokenization there to
treat apostrophe and hyphen as I want to, either in response to a command-line
option, or even automatically, by looking at the special status of these
characters in the relevant dictionary. For the apostrophe, there must already
be code to keep a word-internal apostrophe, while removing a word-marginal one.
The modification would be to keep the apostrophe in any position. For the
hyphen, the modification would be to check the whole word first; and, if not
found, then the present default of checking the parts would be applied.
Does it sound feasible? Any hints?
Ciarán Ó Duibhín.