Re: paragraphs.el: do forward-sentence and friends not work?

From: Stefan Monnier
Subject: Re: paragraphs.el: do forward-sentence and friends not work?
Date: Thu, 14 Feb 2008 09:43:58 -0500
>> Using two spaces after end of sentence enables Emacs to distinguish
>> between periods that end sentences and periods for abbreviations.
>> That is why it should be the default.

> We can improve this to make it work without depending on the double-
> space.

> Sentence tokenization is a known problem. You can throw machine learning
> algorithms at it, but that's not a viable option in our case.  However,
> Grefenstette&Tapanainen (1994) examined this in detail for  English, using
> the Brown corpus. They basically say that using a small  lexicon of common
> abbreviations, they can classify 99.1% of all  periods correctly. Even
> without the lexicon, you can achieve 97.7%  accuracy (on English) using the
> right regular expressions, and I think  this will be similar for other
> languages as well. I think that's good  enough for M-e and M-a.

But the period-single-space vs period-double-space distinction allows us
to get it right 100% in many more languages than just English.

        Stefan "Who switched to non-French spacing even when writing French"

