aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aspell-user] Re: Aspell-user Digest, Vol 57, Issue 1


From: Aaron Miller
Subject: Re: [Aspell-user] Re: Aspell-user Digest, Vol 57, Issue 1
Date: Wed, 15 Aug 2007 23:05:05 -0700

I agree, that would be the ideal solution. The problem I'm facing with
that however, is determining the correct parsing algorithm used by
aspell. So far I've just been trying to "reverse engineer" it by trial
and error. I thought I had it pretty much figured out until this
popped up.

Is there any information out there that gives the exact algorithm (the
name of the source file would do), or even better, a regular
expression to use? I tried looking through the docs but didn't find
anything except for some info on 8 bit chars. It said that it will
convert UTF-8 chars to 8 bit chars. Maybe this is why it was counted
as a word?

Well I will keep plugging away at it. Any suggestions will be greatly
appreciated.

Thanks!
...aaron

On 8/14/07, address@hidden <address@hidden> wrote:
> Hello,
>
> I see in your sample url following:
> scope â€" geniral usage
>
> Therefore for me aspell works (almost) perfectly
> considering
> scope
>  â€"
> usage
>  as correct
> and geniral as an error
>
> In my opinion your algorithm should consider
>  â€" as a word, and that would fix the problem.
>
> -eleonora
>
>
> > Hello,
> >
> > I am playing around with aspell as a server side spell checker for a
> > flash application. It works beautifully (and fast as hell too!), but I
> > did notice one little oddity that I haven't been able to find an
> > explanation for in the docs.
> >
> > The problem happens when there is a special character in the text. I
> > am not sure all of the special characters that cause my word counting
> > algorithm to fail, but here is an example of the one that caused
> > breakage (one of those long dashes that was in some text copied from a
> > wiki).
> >
> > http://labs.splashlabs.com/spellcheck/1186978249
> >
> > When I pipe the above file through aspell (en_US), i get back the result:
> >
> > aspell -a < 1186978249
> > @(#) International Ispell Version 3.1.20 (but really Aspell 0.50.5)
> > *
> > *
> > & geniral 5 10: general, genital, genial, generally, generals
> > *
> >
> > So it appears to count the lone character as a word. In my own
> > program, I have to count words to find the start and end char indexes
> > of the incorrect word. Since my algorithm does not count it as a word,
> > my word count becomes off.
> >
> > Are there any options I can pass to prevent it from being counted? Or
> > is there a way to figure out what all is counted as a word so I can
> > match my own regex to it?
> >
> > Thanks for any advice!
> > ...aaron
>
> --
> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
>
>
> _______________________________________________
> Aspell-user mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/aspell-user
>


-- 
Aaron Miller
Chief Technology Officer
Splash Labs, LLC.
address@hidden  |  206-328-5485
http://www.splashlabs.com




reply via email to

[Prev in Thread] Current Thread [Next in Thread]