emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Beginingless paragraphs: second stab at a patch.


From: Alan Mackenzie
Subject: Re: Beginingless paragraphs: second stab at a patch.
Date: Wed, 7 Sep 2005 19:17:35 +0000 (GMT)

Hi, Richard!

Here is mark II of my patch to searching.texi, incorporating most of the
changes you suggested.  It isn't yet finished - I haven't made any
amendments to the "sentence" bits - so I haven't included a ChangeLog
entry.  I'd appreciate further criticism on it.

On Sun, 4 Sep 2005, Richard M. Stallman wrote:

[ .... ]

>Using @defvar inside of @table is a peculiar thing to do.  It may look
>bad in TeX or in Makeinfo.

I really wanted @subheading, which I've since found in the Texinfo manual
and now put into the text.

>    ! This is the regular expression describing line-beginnings that

>"Describing" is vague; what it does is match them.

I hadn't actually touched the bit about pages.  I have now!

>    !   Buffers divide into @dfn{paragraphs}, 
>
>That is a strange way to put it.  It sounds like you're saying that
>buffers actually split up.  It would be better to make this
>parallel to the info about pages.

I was trying to suggest (i) that _all_ buffers have paragraphs, not just
"special" buffers, for whatever value of special; (ii) The set of
paragraphs in a buffer together with the separator lines COVER a buffer;
it is not the case that a buffer might have an isolated paragraph hiding
away somewhere inside it.  (iii) Also, I was trying to avoid using the
passive voice.

I solved (i) by saying explicitly at the top that all buffers have p, p,
and s.

(ii)+(iii) are more difficult.  In the version of the patch I'm
submitting with this email, I've left a passive in.  I can't find a way
of expressing it which reads well and avoids passives.  Suggestions would
be welcome. 

>    ! normally don't address@hidden is possible for a blank line to be
>    ! both the last line of one paragraph and the first line of the next.}.

>Are you sure?  I don't think so.  A blank line would normally
>be a separator line, not the first or last line of any paragraph.

Try out this file:
------------------------------------------------------
1st Line        [starter]
asdf

1st Line        [starter]
asdf
-
Local Variables:
paragraph-separate: "-"
paragraph-start: "1st Line\\|-"
End:
-----------------------------------------------------

Do M-h on each of the lines "asdf".  The blank line is included in both
paragraphs.  This happens because the blank line isn't a separator here.
It is an ordinary line of the upper paragraph and the "heuristic" (sorry
about the word) blank line tacked on to the paragraph below.  Not
something to lose too much sleep about, perhaps.  I've toned down the bit
about it in the patch.


Here is the patch:


*** searching-1.67.texi Tue Aug 30 09:15:42 2005
--- searching-1.67.acm.texi     Wed Sep  7 16:49:38 2005
***************
*** 1643,1685 ****
  @end table
  
  @node Standard Regexps
! @section Standard Regular Expressions Used in Editing
  @cindex regexps used standardly in editing
  @cindex standard regexps used in editing
  
!   This section describes some variables that hold regular expressions
! used for certain purposes in editing:
  
  @defvar page-delimiter
! This is the regular expression describing line-beginnings that separate
! pages.  The default value is @code{"^\014"} (i.e., @code{"^^L"} or
! @code{"^\C-l"}); this matches a line that starts with a formfeed
! character.
  @end defvar
  
!   The following two regular expressions should @emph{not} assume the
! match always starts at the beginning of a line; they should not use
! @samp{^} to anchor the match.  Most often, the paragraph commands do
! check for a match only at the beginning of a line, which means that
! @samp{^} would be superfluous.  When there is a nonzero left margin,
! they accept matches that start after the left margin.  In that case, a
! @samp{^} would be incorrect.  However, a @samp{^} is harmless in modes
! where a left margin is never used.
  
  @defvar paragraph-separate
! This is the regular expression for recognizing the beginning of a line
! that separates paragraphs.  (If you change this, you may have to
! change @code{paragraph-start} also.)  The default value is
! @address@hidden"[@ \t\f]*$"}}, which matches a line that consists entirely of
! spaces, tabs, and form feeds (after its left margin).
  @end defvar
  
  @defvar paragraph-start
! This is the regular expression for recognizing the beginning of a line
! that starts @emph{or} separates paragraphs.  The default value is
! @address@hidden"\f\\|[ \t]*$"}}, which matches a line containing only
! whitespace or starting with a form feed (after its left margin).
  @end defvar
  
  @defvar sentence-end
  If address@hidden, the value should be a regular expression describing
--- 1643,1750 ----
  @end table
  
  @node Standard Regexps
! @section Regular Expressions for Pages, Paragraphs, and Sentences
  @cindex regexps used standardly in editing
  @cindex standard regexps used in editing
  
!   This section specifies precisely what pages, paragraphs, and
! sentences are in Emacs and the regular expressions it uses to
! recognize them.  By setting these variables appropriately, the Emacs
! Lisp programmer can control the precise effect of the standard
! commands that move over, kill, fill, mark, narrow to, and otherwise
! operate on these pieces of text.  Note that these variables are
! @emph{not} buffer local by default.
! 
!   Although the notions of pages, paragraphs, and sentences are mostly
! useful in modes for natural language text, the commands which use
! these textual units work in @emph{all} buffers.
! 
! @cindex page
! @subheading Pages
! 
!   A @dfn{page} in an Emacs buffer is an expanse of text extending from
! just after a @dfn{page delimiter} to just after the next one---a page
! delimiter is part of the page it terminates.  A page delimiter is an
! arbitrarily defined sequence of text which starts at column zero and
! may extend over several lines.  By default it is a single formfeed at
! column zero.  The beginning and end of the buffer also count as page
! boundaries.
  
  @defvar page-delimiter
! This is the regular expression that matches a page delimiter.  It
! should be anchored to the beginning of the line (i.e. it should start
! with @samp{^}).  The default value is @code{"^\014"} (i.e.,
! @code{"^^L"} or @code{"^\C-l"}).
  @end defvar
  
! @cindex paragraph
! @subheading Paragraphs
!   Buffers in Emacs can be viewed as consisting of @dfn{Paragraphs},
! certain sequences of whole lines.  The two regular expressions
! @code{paragraph-separate} and @code{paragraph-start} determine where
! they start and end.  Paragraphs don't address@hidden certain
! obscure circumstances it is possible for a blank line to be both the
! last line of one paragraph and the first line of the next.}.  Between
! two paragraphs there are often one or more @dfn{separator lines},
! which aren't part of any paragraph.  The beginning and end of the
! buffer always count as paragraph boundaries.
! 
! The two ways that paragraphs can be separated are:
! 
! @itemize @bullet
! @item
! With separator lines---one or more separator lines split the old
! paragraph from the new one.  Whether @code{paragraph-start} would also
! recognize the first line of the new paragraph is irrelevant.
! 
! @item
! Without separator lines---any line, apart from a separator line, which
! @code{paragraph-start} recognizes starts a new paragraph.  This might
! be an indented line, for example.
! @end itemize
  
  @defvar paragraph-separate
! This regular expression recognizes a separator line by matching any
! portion of it which begins at its left margin (@pxref{Margins}).  (If
! you change this, you may have to change @code{paragraph-start} also.)
! The default value is @address@hidden"[@ \t\f]*$"}}, which matches a line
! that consists entirely of spaces, tabs, and form feeds (after its left
! margin).
  @end defvar
  
  @defvar paragraph-start
! This regular expression recognizes @emph{either} a line which starts a
! paragraph when the previous line is not a separator @emph{or} a
! separator line.  It need only match some portion beginning at the
! line's left margin (@pxref{Margins}), not the whole line.  The default
! value is @address@hidden"\f\\|[ \t]*$"}}, which matches a line containing
! only whitespace or starting with a form feed (after its left margin).
  @end defvar
+ 
+   Additionally, if a line tentatively recognized as the start of a
+ paragraph follows a whitespace line, the whitespace line is included
+ in the paragraph.
+ 
+   The usual values of @code{paragraph-separate} and
+ @code{paragraph-start} contain @samp{\f} (a formfeed) and thus
+ constrain paragraphs (and hence sentences) to end at a page boundary.
+ This works well for the way page separators are mostly used in Emacs.
+ If you want paragraphs to straddle page boundaries, like they do in
+ printed books, set these variables to, say, @address@hidden"[@ \t]*$"}} and
+ @address@hidden"[@ \t]*$"}}.
+ 
+   Since the above two regular expressions, @code{paragraph-start} and
+ @code{paragraph-separate}, are matched against text at the left
+ margin, they should @emph{not} use @samp{^} to anchor the match to the
+ beginning of the line.  Most often, the paragraph commands do check
+ for a match only at the beginning of a line, which means that @samp{^}
+ would be superfluous.  When there is a nonzero left margin, they
+ accept matches that start after the left margin.  In that case, a
+ @samp{^} would be incorrect.  However, a @samp{^} is harmless in modes
+ where a left margin is never used.
+ 
+ @cindex sentence
+ @subheading Sentences
  
  @defvar sentence-end
  If address@hidden, the value should be a regular expression describing


-- 
Alan Mackenzie (Munich, Germany)






reply via email to

[Prev in Thread] Current Thread [Next in Thread]