guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, master, updated. release_1-9-13-18-g96


From: Neil Jerram
Subject: [Guile-commits] GNU Guile branch, master, updated. release_1-9-13-18-g96ca59d
Date: Sun, 31 Oct 2010 08:25:06 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=96ca59d839fd87cc021f58f5e864e1e195164292

The branch, master has been updated
       via  96ca59d839fd87cc021f58f5e864e1e195164292 (commit)
      from  01a4f0aae516444baf6855b5f1ab1689311314ba (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 96ca59d839fd87cc021f58f5e864e1e195164292
Author: Neil Jerram <address@hidden>
Date:   Sun Oct 31 08:24:28 2010 +0000

    Promote regex doc out of the `Simple Data Types' section
    
    Because that probably isn't where people will look for it.
    Thanks to Noah Lavine for the idea.
    
    * doc/ref/api-regex.texi (Regular Expressions): New file, containing
      the regex doc (promoted one level) that used to be in api-data.texi.
    
    * doc/ref/guile.texi (API Reference): Include new file, and add menu
      entry for the new section.
    
    * THANKS: Add Noah.

-----------------------------------------------------------------------

Summary of changes:
 THANKS                 |    1 +
 doc/ref/api-data.texi  |  532 -----------------------------------------------
 doc/ref/api-regex.texi |  535 ++++++++++++++++++++++++++++++++++++++++++++++++
 doc/ref/guile.texi     |    2 +
 4 files changed, 538 insertions(+), 532 deletions(-)
 create mode 100644 doc/ref/api-regex.texi

diff --git a/THANKS b/THANKS
index 3ee51e7..c9a46e2 100644
--- a/THANKS
+++ b/THANKS
@@ -72,6 +72,7 @@ For fixes or providing information which led to a fix:
        Matthias Köppe
            Matt Kraai
          Daniel Kraft
+           Noah Lavine
        Miroslav Lichvar
          Daniel Llorens del Río
            Jeff Long
diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index caa5d8e..9f0217f 100755
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -45,7 +45,6 @@ For the documentation of such @dfn{compound} data types, see
 * Character Sets::              Sets of characters.
 * Strings::                     Sequences of characters.
 * Bytevectors::                 Sequences of bytes.
-* Regular Expressions::         Pattern matching and substitution.
 * Symbols::                     Symbols.
 * Keywords::                    Self-quoting, customizable display keywords.
 * Other Types::                 "Functionality-centric" data types.
@@ -4547,537 +4546,6 @@ Bytevectors may also be accessed with the SRFI-4 API. 
@xref{SRFI-4 and
 Bytevectors}, for more information.
 
 
address@hidden Regular Expressions
address@hidden Regular Expressions
address@hidden Regular expressions
-
address@hidden regular expressions
address@hidden regex
address@hidden emacs regexp
-
-A @dfn{regular expression} (or @dfn{regexp}) is a pattern that
-describes a whole class of strings.  A full description of regular
-expressions and their syntax is beyond the scope of this manual;
-an introduction can be found in the Emacs manual (@pxref{Regexps,
-, Syntax of Regular Expressions, emacs, The GNU Emacs Manual}), or
-in many general Unix reference books.
-
-If your system does not include a POSIX regular expression library,
-and you have not linked Guile with a third-party regexp library such
-as Rx, these functions will not be available.  You can tell whether
-your Guile installation includes regular expression support by
-checking whether @code{(provided? 'regex)} returns true.
-
-The following regexp and string matching features are provided by the
address@hidden(ice-9 regex)} module.  Before using the described functions,
-you should load this module by executing @code{(use-modules (ice-9
-regex))}.
-
address@hidden
-* Regexp Functions::            Functions that create and match regexps.
-* Match Structures::            Finding what was matched by a regexp.
-* Backslash Escapes::           Removing the special meaning of regexp
-                                meta-characters.
address@hidden menu
-
-
address@hidden Regexp Functions
address@hidden Regexp Functions
-
-By default, Guile supports POSIX extended regular expressions.
-That means that the characters @samp{(}, @samp{)}, @samp{+} and
address@hidden are special, and must be escaped if you wish to match the
-literal characters.
-
-This regular expression interface was modeled after that
-implemented by SCSH, the Scheme Shell.  It is intended to be
-upwardly compatible with SCSH regular expressions.
-
-Zero bytes (@code{#\nul}) cannot be used in regex patterns or input
-strings, since the underlying C functions treat that as the end of
-string.  If there's a zero byte an error is thrown.
-
-Patterns and input strings are treated as being in the locale
-character set if @code{setlocale} has been called (@pxref{Locales}),
-and in a multibyte locale this includes treating multi-byte sequences
-as a single character.  (Guile strings are currently merely bytes,
-though this may change in the future, @xref{Conversion to/from C}.)
-
address@hidden {Scheme Procedure} string-match pattern str [start]
-Compile the string @var{pattern} into a regular expression and compare
-it with @var{str}.  The optional numeric argument @var{start} specifies
-the position of @var{str} at which to begin matching.
-
address@hidden returns a @dfn{match structure} which
-describes what, if anything, was matched by the regular
-expression.  @xref{Match Structures}.  If @var{str} does not match
address@hidden at all, @code{string-match} returns @code{#f}.
address@hidden deffn
-
-Two examples of a match follow.  In the first example, the pattern
-matches the four digits in the match string.  In the second, the pattern
-matches nothing.
-
address@hidden
-(string-match "[0-9][0-9][0-9][0-9]" "blah2002")
address@hidden #("blah2002" (4 . 8))
-
-(string-match "[A-Za-z]" "123456")
address@hidden #f
address@hidden example
-
-Each time @code{string-match} is called, it must compile its
address@hidden argument into a regular expression structure.  This
-operation is expensive, which makes @code{string-match} inefficient if
-the same regular expression is used several times (for example, in a
-loop).  For better performance, you can compile a regular expression in
-advance and then match strings against the compiled regexp.
-
address@hidden {Scheme Procedure} make-regexp pat address@hidden
address@hidden {C Function} scm_make_regexp (pat, flaglst)
-Compile the regular expression described by @var{pat}, and
-return the compiled regexp structure.  If @var{pat} does not
-describe a legal regular expression, @code{make-regexp} throws
-a @code{regular-expression-syntax} error.
-
-The @var{flag} arguments change the behavior of the compiled
-regular expression.  The following values may be supplied:
-
address@hidden regexp/icase
-Consider uppercase and lowercase letters to be the same when
-matching.
address@hidden defvar
-
address@hidden regexp/newline
-If a newline appears in the target string, then permit the
address@hidden and @samp{$} operators to match immediately after or
-immediately before the newline, respectively.  Also, the
address@hidden and @samp{[^...]} operators will never match a newline
-character.  The intent of this flag is to treat the target
-string as a buffer containing many lines of text, and the
-regular expression as a pattern that may match a single one of
-those lines.
address@hidden defvar
-
address@hidden regexp/basic
-Compile a basic (``obsolete'') regexp instead of the extended
-(``modern'') regexps that are the default.  Basic regexps do
-not consider @samp{|}, @samp{+} or @samp{?} to be special
-characters, and require the @address@hidden@}} and @samp{(...)}
-metacharacters to be backslash-escaped (@pxref{Backslash
-Escapes}).  There are several other differences between basic
-and extended regular expressions, but these are the most
-significant.
address@hidden defvar
-
address@hidden regexp/extended
-Compile an extended regular expression rather than a basic
-regexp.  This is the default behavior; this flag will not
-usually be needed.  If a call to @code{make-regexp} includes
-both @code{regexp/basic} and @code{regexp/extended} flags, the
-one which comes last will override the earlier one.
address@hidden defvar
address@hidden deffn
-
address@hidden {Scheme Procedure} regexp-exec rx str [start [flags]]
address@hidden {C Function} scm_regexp_exec (rx, str, start, flags)
-Match the compiled regular expression @var{rx} against
address@hidden  If the optional integer @var{start} argument is
-provided, begin matching from that position in the string.
-Return a match structure describing the results of the match,
-or @code{#f} if no match could be found.
-
-The @var{flags} argument changes the matching behavior.  The following
-flag values may be supplied, use @code{logior} (@pxref{Bitwise
-Operations}) to combine them,
-
address@hidden regexp/notbol
-Consider that the @var{start} offset into @var{str} is not the
-beginning of a line and should not match operator @samp{^}.
-
-If @var{rx} was created with the @code{regexp/newline} option above,
address@hidden will still match after a newline in @var{str}.
address@hidden defvar
-
address@hidden regexp/noteol
-Consider that the end of @var{str} is not the end of a line and should
-not match operator @samp{$}.
-
-If @var{rx} was created with the @code{regexp/newline} option above,
address@hidden will still match before a newline in @var{str}.
address@hidden defvar
address@hidden deffn
-
address@hidden
-;; Regexp to match uppercase letters
-(define r (make-regexp "[A-Z]*"))
-
-;; Regexp to match letters, ignoring case
-(define ri (make-regexp "[A-Z]*" regexp/icase))
-
-;; Search for bob using regexp r
-(match:substring (regexp-exec r "bob"))
address@hidden ""                  ; no match
-
-;; Search for bob using regexp ri
-(match:substring (regexp-exec ri "Bob"))
address@hidden "Bob"               ; matched case insensitive
address@hidden lisp
-
address@hidden {Scheme Procedure} regexp? obj
address@hidden {C Function} scm_regexp_p (obj)
-Return @code{#t} if @var{obj} is a compiled regular expression,
-or @code{#f} otherwise.
address@hidden deffn
-
address@hidden 1
address@hidden {Scheme Procedure} list-matches regexp str [flags]
-Return a list of match structures which are the non-overlapping
-matches of @var{regexp} in @var{str}.  @var{regexp} can be either a
-pattern string or a compiled regexp.  The @var{flags} argument is as
-per @code{regexp-exec} above.
-
address@hidden
-(map match:substring (list-matches "[a-z]+" "abc 42 def 78"))
address@hidden ("abc" "def")
address@hidden  example
address@hidden deffn
-
address@hidden {Scheme Procedure} fold-matches regexp str init proc [flags]
-Apply @var{proc} to the non-overlapping matches of @var{regexp} in
address@hidden, to build a result.  @var{regexp} can be either a pattern
-string or a compiled regexp.  The @var{flags} argument is as per
address@hidden above.
-
address@hidden is called as @code{(@var{proc} match prev)} where
address@hidden is a match structure and @var{prev} is the previous return
-from @var{proc}.  For the first call @var{prev} is the given
address@hidden parameter.  @code{fold-matches} returns the final value
-from @var{proc}.
-
-For example to count matches,
-
address@hidden
-(fold-matches "[a-z][0-9]" "abc x1 def y2" 0
-              (lambda (match count)
-                (1+ count)))
address@hidden 2
address@hidden example
address@hidden deffn
-
address@hidden 1
-Regular expressions are commonly used to find patterns in one string
-and replace them with the contents of another string.  The following
-functions are convenient ways to do this.
-
address@hidden begin (scm-doc-string "regex.scm" "regexp-substitute")
address@hidden {Scheme Procedure} regexp-substitute port match address@hidden
-Write to @var{port} selected parts of the match structure @var{match}.
-Or if @var{port} is @code{#f} then form a string from those parts and
-return that.
-
-Each @var{item} specifies a part to be written, and may be one of the
-following,
-
address@hidden @bullet
address@hidden
-A string.  String arguments are written out verbatim.
-
address@hidden
-An integer.  The submatch with that number is written
-(@code{match:substring}).  Zero is the entire match.
-
address@hidden
-The symbol @samp{pre}.  The portion of the matched string preceding
-the regexp match is written (@code{match:prefix}).
-
address@hidden
-The symbol @samp{post}.  The portion of the matched string following
-the regexp match is written (@code{match:suffix}).
address@hidden itemize
-
-For example, changing a match and retaining the text before and after,
-
address@hidden
-(regexp-substitute #f (string-match "[0-9]+" "number 25 is good")
-                   'pre "37" 'post)
address@hidden "number 37 is good"
address@hidden example
-
-Or matching a @sc{yyyymmdd} format date such as @samp{20020828} and
-re-ordering and hyphenating the fields.
-
address@hidden
-(define date-regex
-   "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
-(define s "Date 20020429 12am.")
-(regexp-substitute #f (string-match date-regex s)
-                   'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
address@hidden "Date 04-29-2002 12am. (20020429)"
address@hidden lisp
address@hidden deffn
-
-
address@hidden begin (scm-doc-string "regex.scm" "regexp-substitute")
address@hidden {Scheme Procedure} regexp-substitute/global port regexp target 
address@hidden
address@hidden search and replace
-Write to @var{port} selected parts of matches of @var{regexp} in
address@hidden  If @var{port} is @code{#f} then form a string from
-those parts and return that.  @var{regexp} can be a string or a
-compiled regex.
-
-This is similar to @code{regexp-substitute}, but allows global
-substitutions on @var{target}.  Each @var{item} behaves as per
address@hidden, with the following differences,
-
address@hidden @bullet
address@hidden
-A function.  Called as @code{(@var{item} match)} with the match
-structure for the @var{regexp} match, it should return a string to be
-written to @var{port}.
-
address@hidden
-The symbol @samp{post}.  This doesn't output anything, but instead
-causes @code{regexp-substitute/global} to recurse on the unmatched
-portion of @var{target}.
-
-This @emph{must} be supplied to perform a global search and replace on
address@hidden; without it @code{regexp-substitute/global} returns after
-a single match and output.
address@hidden itemize
-
-For example, to collapse runs of tabs and spaces to a single hyphen
-each,
-
address@hidden
-(regexp-substitute/global #f "[ \t]+"  "this   is   the text"
-                          'pre "-" 'post)
address@hidden "this-is-the-text"
address@hidden example
-
-Or using a function to reverse the letters in each word,
-
address@hidden
-(regexp-substitute/global #f "[a-z]+"  "to do and not-do"
-  'pre (lambda (m) (string-reverse (match:substring m))) 'post)
address@hidden "ot od dna ton-od"
address@hidden example
-
-Without the @code{post} symbol, just one regexp match is made.  For
-example the following is the date example from
address@hidden above, without the need for the separate
address@hidden call.
-
address@hidden
-(define date-regex 
-   "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
-(define s "Date 20020429 12am.")
-(regexp-substitute/global #f date-regex s
-                          'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
-
address@hidden "Date 04-29-2002 12am. (20020429)"
address@hidden lisp
address@hidden deffn
-
-
address@hidden Match Structures
address@hidden Match Structures
-
address@hidden match structures
-
-A @dfn{match structure} is the object returned by @code{string-match} and
address@hidden  It describes which portion of a string, if any,
-matched the given regular expression.  Match structures include: a
-reference to the string that was checked for matches; the starting and
-ending positions of the regexp match; and, if the regexp included any
-parenthesized subexpressions, the starting and ending positions of each
-submatch.
-
-In each of the regexp match functions described below, the @code{match}
-argument must be a match structure returned by a previous call to
address@hidden or @code{regexp-exec}.  Most of these functions
-return some information about the original target string that was
-matched against a regular expression; we will call that string
address@hidden for easy reference.
-
address@hidden begin (scm-doc-string "regex.scm" "regexp-match?")
address@hidden {Scheme Procedure} regexp-match? obj
-Return @code{#t} if @var{obj} is a match structure returned by a
-previous call to @code{regexp-exec}, or @code{#f} otherwise.
address@hidden deffn
-
address@hidden begin (scm-doc-string "regex.scm" "match:substring")
address@hidden {Scheme Procedure} match:substring match [n]
-Return the portion of @var{target} matched by subexpression number
address@hidden  Submatch 0 (the default) represents the entire regexp match.
-If the regular expression as a whole matched, but the subexpression
-number @var{n} did not match, return @code{#f}.
address@hidden deffn
-
address@hidden
-(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
-(match:substring s)
address@hidden "2002"
-
-;; match starting at offset 6 in the string
-(match:substring
-  (string-match "[0-9][0-9][0-9][0-9]" "blah987654" 6))
address@hidden "7654"
address@hidden lisp
-
address@hidden begin (scm-doc-string "regex.scm" "match:start")
address@hidden {Scheme Procedure} match:start match [n]
-Return the starting position of submatch number @var{n}.
address@hidden deffn
-
-In the following example, the result is 4, since the match starts at
-character index 4:
-
address@hidden
-(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
-(match:start s)
address@hidden 4
address@hidden lisp
-
address@hidden begin (scm-doc-string "regex.scm" "match:end")
address@hidden {Scheme Procedure} match:end match [n]
-Return the ending position of submatch number @var{n}.
address@hidden deffn
-
-In the following example, the result is 8, since the match runs between
-characters 4 and 8 (i.e. the ``2002'').
-
address@hidden
-(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
-(match:end s)
address@hidden 8
address@hidden lisp
-
address@hidden begin (scm-doc-string "regex.scm" "match:prefix")
address@hidden {Scheme Procedure} match:prefix match
-Return the unmatched portion of @var{target} preceding the regexp match.
-
address@hidden
-(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
-(match:prefix s)
address@hidden "blah"
address@hidden lisp
address@hidden deffn
-
address@hidden begin (scm-doc-string "regex.scm" "match:suffix")
address@hidden {Scheme Procedure} match:suffix match
-Return the unmatched portion of @var{target} following the regexp match.
address@hidden deffn
-
address@hidden
-(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
-(match:suffix s)
address@hidden "foo"
address@hidden lisp
-
address@hidden begin (scm-doc-string "regex.scm" "match:count")
address@hidden {Scheme Procedure} match:count match
-Return the number of parenthesized subexpressions from @var{match}.
-Note that the entire regular expression match itself counts as a
-subexpression, and failed submatches are included in the count.
address@hidden deffn
-
address@hidden begin (scm-doc-string "regex.scm" "match:string")
address@hidden {Scheme Procedure} match:string match
-Return the original @var{target} string.
address@hidden deffn
-
address@hidden
-(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
-(match:string s)
address@hidden "blah2002foo"
address@hidden lisp
-
-
address@hidden Backslash Escapes
address@hidden Backslash Escapes
-
-Sometimes you will want a regexp to match characters like @samp{*} or
address@hidden exactly.  For example, to check whether a particular string
-represents a menu entry from an Info node, it would be useful to match
-it against a regexp like @samp{^* [^:]*::}.  However, this won't work;
-because the asterisk is a metacharacter, it won't match the @samp{*} at
-the beginning of the string.  In this case, we want to make the first
-asterisk un-magic.
-
-You can do this by preceding the metacharacter with a backslash
-character @samp{\}.  (This is also called @dfn{quoting} the
-metacharacter, and is known as a @dfn{backslash escape}.)  When Guile
-sees a backslash in a regular expression, it considers the following
-glyph to be an ordinary character, no matter what special meaning it
-would ordinarily have.  Therefore, we can make the above example work by
-changing the regexp to @samp{^\* [^:]*::}.  The @samp{\*} sequence tells
-the regular expression engine to match only a single asterisk in the
-target string.
-
-Since the backslash is itself a metacharacter, you may force a regexp to
-match a backslash in the target string by preceding the backslash with
-itself.  For example, to find variable references in a @TeX{} program,
-you might want to find occurrences of the string @samp{\let\} followed
-by any number of alphabetic characters.  The regular expression
address@hidden would do this: the double backslashes in the
-regexp each match a single backslash in the target string.
-
address@hidden begin (scm-doc-string "regex.scm" "regexp-quote")
address@hidden {Scheme Procedure} regexp-quote str
-Quote each special character found in @var{str} with a backslash, and
-return the resulting string.
address@hidden deffn
-
address@hidden important:} Using backslash escapes in Guile source code
-(as in Emacs Lisp or C) can be tricky, because the backslash character
-has special meaning for the Guile reader.  For example, if Guile
-encounters the character sequence @samp{\n} in the middle of a string
-while processing Scheme code, it replaces those characters with a
-newline character.  Similarly, the character sequence @samp{\t} is
-replaced by a horizontal tab.  Several of these @dfn{escape sequences}
-are processed by the Guile reader before your code is executed.
-Unrecognized escape sequences are ignored: if the characters @samp{\*}
-appear in a string, they will be translated to the single character
address@hidden
-
-This translation is obviously undesirable for regular expressions, since
-we want to be able to include backslashes in a string in order to
-escape regexp metacharacters.  Therefore, to make sure that a backslash
-is preserved in a string in your Guile program, you must use @emph{two}
-consecutive backslashes:
-
address@hidden
-(define Info-menu-entry-pattern (make-regexp "^\\* [^:]*"))
address@hidden lisp
-
-The string in this example is preprocessed by the Guile reader before
-any code is executed.  The resulting argument to @code{make-regexp} is
-the string @samp{^\* [^:]*}, which is what we really want.
-
-This also means that in order to write a regular expression that matches
-a single backslash character, the regular expression string in the
-source code must include @emph{four} backslashes.  Each consecutive pair
-of backslashes gets translated by the Guile reader to a single
-backslash, and the resulting double-backslash is interpreted by the
-regexp engine as matching a single backslash character.  Hence:
-
address@hidden
-(define tex-variable-pattern (make-regexp "\\\\let\\\\=[A-Za-z]*"))
address@hidden lisp
-
-The reason for the unwieldiness of this syntax is historical.  Both
-regular expression pattern matchers and Unix string processing systems
-have traditionally used backslashes with the special meanings
-described above.  The POSIX regular expression specification and ANSI C
-standard both require these semantics.  Attempting to abandon either
-convention would cause other kinds of compatibility problems, possibly
-more severe ones.  Therefore, without extending the Scheme reader to
-support strings with different quoting conventions (an ungainly and
-confusing extension when implemented in other languages), we must adhere
-to this cumbersome escape syntax.
-
-
 @node Symbols
 @subsection Symbols
 @tpindex Symbols
diff --git a/doc/ref/api-regex.texi b/doc/ref/api-regex.texi
new file mode 100644
index 0000000..61410d9
--- /dev/null
+++ b/doc/ref/api-regex.texi
@@ -0,0 +1,535 @@
address@hidden -*-texinfo-*-
address@hidden This is part of the GNU Guile Reference Manual.
address@hidden Copyright (C)  1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 
2009, 2010
address@hidden   Free Software Foundation, Inc.
address@hidden See the file guile.texi for copying conditions.
+
address@hidden Regular Expressions
address@hidden Regular Expressions
address@hidden Regular expressions
+
address@hidden regular expressions
address@hidden regex
address@hidden emacs regexp
+
+A @dfn{regular expression} (or @dfn{regexp}) is a pattern that
+describes a whole class of strings.  A full description of regular
+expressions and their syntax is beyond the scope of this manual;
+an introduction can be found in the Emacs manual (@pxref{Regexps,
+, Syntax of Regular Expressions, emacs, The GNU Emacs Manual}), or
+in many general Unix reference books.
+
+If your system does not include a POSIX regular expression library,
+and you have not linked Guile with a third-party regexp library such
+as Rx, these functions will not be available.  You can tell whether
+your Guile installation includes regular expression support by
+checking whether @code{(provided? 'regex)} returns true.
+
+The following regexp and string matching features are provided by the
address@hidden(ice-9 regex)} module.  Before using the described functions,
+you should load this module by executing @code{(use-modules (ice-9
+regex))}.
+
address@hidden
+* Regexp Functions::            Functions that create and match regexps.
+* Match Structures::            Finding what was matched by a regexp.
+* Backslash Escapes::           Removing the special meaning of regexp
+                                meta-characters.
address@hidden menu
+
+
address@hidden Regexp Functions
address@hidden Regexp Functions
+
+By default, Guile supports POSIX extended regular expressions.
+That means that the characters @samp{(}, @samp{)}, @samp{+} and
address@hidden are special, and must be escaped if you wish to match the
+literal characters.
+
+This regular expression interface was modeled after that
+implemented by SCSH, the Scheme Shell.  It is intended to be
+upwardly compatible with SCSH regular expressions.
+
+Zero bytes (@code{#\nul}) cannot be used in regex patterns or input
+strings, since the underlying C functions treat that as the end of
+string.  If there's a zero byte an error is thrown.
+
+Patterns and input strings are treated as being in the locale
+character set if @code{setlocale} has been called (@pxref{Locales}),
+and in a multibyte locale this includes treating multi-byte sequences
+as a single character.  (Guile strings are currently merely bytes,
+though this may change in the future, @xref{Conversion to/from C}.)
+
address@hidden {Scheme Procedure} string-match pattern str [start]
+Compile the string @var{pattern} into a regular expression and compare
+it with @var{str}.  The optional numeric argument @var{start} specifies
+the position of @var{str} at which to begin matching.
+
address@hidden returns a @dfn{match structure} which
+describes what, if anything, was matched by the regular
+expression.  @xref{Match Structures}.  If @var{str} does not match
address@hidden at all, @code{string-match} returns @code{#f}.
address@hidden deffn
+
+Two examples of a match follow.  In the first example, the pattern
+matches the four digits in the match string.  In the second, the pattern
+matches nothing.
+
address@hidden
+(string-match "[0-9][0-9][0-9][0-9]" "blah2002")
address@hidden #("blah2002" (4 . 8))
+
+(string-match "[A-Za-z]" "123456")
address@hidden #f
address@hidden example
+
+Each time @code{string-match} is called, it must compile its
address@hidden argument into a regular expression structure.  This
+operation is expensive, which makes @code{string-match} inefficient if
+the same regular expression is used several times (for example, in a
+loop).  For better performance, you can compile a regular expression in
+advance and then match strings against the compiled regexp.
+
address@hidden {Scheme Procedure} make-regexp pat address@hidden
address@hidden {C Function} scm_make_regexp (pat, flaglst)
+Compile the regular expression described by @var{pat}, and
+return the compiled regexp structure.  If @var{pat} does not
+describe a legal regular expression, @code{make-regexp} throws
+a @code{regular-expression-syntax} error.
+
+The @var{flag} arguments change the behavior of the compiled
+regular expression.  The following values may be supplied:
+
address@hidden regexp/icase
+Consider uppercase and lowercase letters to be the same when
+matching.
address@hidden defvar
+
address@hidden regexp/newline
+If a newline appears in the target string, then permit the
address@hidden and @samp{$} operators to match immediately after or
+immediately before the newline, respectively.  Also, the
address@hidden and @samp{[^...]} operators will never match a newline
+character.  The intent of this flag is to treat the target
+string as a buffer containing many lines of text, and the
+regular expression as a pattern that may match a single one of
+those lines.
address@hidden defvar
+
address@hidden regexp/basic
+Compile a basic (``obsolete'') regexp instead of the extended
+(``modern'') regexps that are the default.  Basic regexps do
+not consider @samp{|}, @samp{+} or @samp{?} to be special
+characters, and require the @address@hidden@}} and @samp{(...)}
+metacharacters to be backslash-escaped (@pxref{Backslash
+Escapes}).  There are several other differences between basic
+and extended regular expressions, but these are the most
+significant.
address@hidden defvar
+
address@hidden regexp/extended
+Compile an extended regular expression rather than a basic
+regexp.  This is the default behavior; this flag will not
+usually be needed.  If a call to @code{make-regexp} includes
+both @code{regexp/basic} and @code{regexp/extended} flags, the
+one which comes last will override the earlier one.
address@hidden defvar
address@hidden deffn
+
address@hidden {Scheme Procedure} regexp-exec rx str [start [flags]]
address@hidden {C Function} scm_regexp_exec (rx, str, start, flags)
+Match the compiled regular expression @var{rx} against
address@hidden  If the optional integer @var{start} argument is
+provided, begin matching from that position in the string.
+Return a match structure describing the results of the match,
+or @code{#f} if no match could be found.
+
+The @var{flags} argument changes the matching behavior.  The following
+flag values may be supplied, use @code{logior} (@pxref{Bitwise
+Operations}) to combine them,
+
address@hidden regexp/notbol
+Consider that the @var{start} offset into @var{str} is not the
+beginning of a line and should not match operator @samp{^}.
+
+If @var{rx} was created with the @code{regexp/newline} option above,
address@hidden will still match after a newline in @var{str}.
address@hidden defvar
+
address@hidden regexp/noteol
+Consider that the end of @var{str} is not the end of a line and should
+not match operator @samp{$}.
+
+If @var{rx} was created with the @code{regexp/newline} option above,
address@hidden will still match before a newline in @var{str}.
address@hidden defvar
address@hidden deffn
+
address@hidden
+;; Regexp to match uppercase letters
+(define r (make-regexp "[A-Z]*"))
+
+;; Regexp to match letters, ignoring case
+(define ri (make-regexp "[A-Z]*" regexp/icase))
+
+;; Search for bob using regexp r
+(match:substring (regexp-exec r "bob"))
address@hidden ""                  ; no match
+
+;; Search for bob using regexp ri
+(match:substring (regexp-exec ri "Bob"))
address@hidden "Bob"               ; matched case insensitive
address@hidden lisp
+
address@hidden {Scheme Procedure} regexp? obj
address@hidden {C Function} scm_regexp_p (obj)
+Return @code{#t} if @var{obj} is a compiled regular expression,
+or @code{#f} otherwise.
address@hidden deffn
+
address@hidden 1
address@hidden {Scheme Procedure} list-matches regexp str [flags]
+Return a list of match structures which are the non-overlapping
+matches of @var{regexp} in @var{str}.  @var{regexp} can be either a
+pattern string or a compiled regexp.  The @var{flags} argument is as
+per @code{regexp-exec} above.
+
address@hidden
+(map match:substring (list-matches "[a-z]+" "abc 42 def 78"))
address@hidden ("abc" "def")
address@hidden  example
address@hidden deffn
+
address@hidden {Scheme Procedure} fold-matches regexp str init proc [flags]
+Apply @var{proc} to the non-overlapping matches of @var{regexp} in
address@hidden, to build a result.  @var{regexp} can be either a pattern
+string or a compiled regexp.  The @var{flags} argument is as per
address@hidden above.
+
address@hidden is called as @code{(@var{proc} match prev)} where
address@hidden is a match structure and @var{prev} is the previous return
+from @var{proc}.  For the first call @var{prev} is the given
address@hidden parameter.  @code{fold-matches} returns the final value
+from @var{proc}.
+
+For example to count matches,
+
address@hidden
+(fold-matches "[a-z][0-9]" "abc x1 def y2" 0
+              (lambda (match count)
+                (1+ count)))
address@hidden 2
address@hidden example
address@hidden deffn
+
address@hidden 1
+Regular expressions are commonly used to find patterns in one string
+and replace them with the contents of another string.  The following
+functions are convenient ways to do this.
+
address@hidden begin (scm-doc-string "regex.scm" "regexp-substitute")
address@hidden {Scheme Procedure} regexp-substitute port match address@hidden
+Write to @var{port} selected parts of the match structure @var{match}.
+Or if @var{port} is @code{#f} then form a string from those parts and
+return that.
+
+Each @var{item} specifies a part to be written, and may be one of the
+following,
+
address@hidden @bullet
address@hidden
+A string.  String arguments are written out verbatim.
+
address@hidden
+An integer.  The submatch with that number is written
+(@code{match:substring}).  Zero is the entire match.
+
address@hidden
+The symbol @samp{pre}.  The portion of the matched string preceding
+the regexp match is written (@code{match:prefix}).
+
address@hidden
+The symbol @samp{post}.  The portion of the matched string following
+the regexp match is written (@code{match:suffix}).
address@hidden itemize
+
+For example, changing a match and retaining the text before and after,
+
address@hidden
+(regexp-substitute #f (string-match "[0-9]+" "number 25 is good")
+                   'pre "37" 'post)
address@hidden "number 37 is good"
address@hidden example
+
+Or matching a @sc{yyyymmdd} format date such as @samp{20020828} and
+re-ordering and hyphenating the fields.
+
address@hidden
+(define date-regex
+   "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
+(define s "Date 20020429 12am.")
+(regexp-substitute #f (string-match date-regex s)
+                   'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
address@hidden "Date 04-29-2002 12am. (20020429)"
address@hidden lisp
address@hidden deffn
+
+
address@hidden begin (scm-doc-string "regex.scm" "regexp-substitute")
address@hidden {Scheme Procedure} regexp-substitute/global port regexp target 
address@hidden
address@hidden search and replace
+Write to @var{port} selected parts of matches of @var{regexp} in
address@hidden  If @var{port} is @code{#f} then form a string from
+those parts and return that.  @var{regexp} can be a string or a
+compiled regex.
+
+This is similar to @code{regexp-substitute}, but allows global
+substitutions on @var{target}.  Each @var{item} behaves as per
address@hidden, with the following differences,
+
address@hidden @bullet
address@hidden
+A function.  Called as @code{(@var{item} match)} with the match
+structure for the @var{regexp} match, it should return a string to be
+written to @var{port}.
+
address@hidden
+The symbol @samp{post}.  This doesn't output anything, but instead
+causes @code{regexp-substitute/global} to recurse on the unmatched
+portion of @var{target}.
+
+This @emph{must} be supplied to perform a global search and replace on
address@hidden; without it @code{regexp-substitute/global} returns after
+a single match and output.
address@hidden itemize
+
+For example, to collapse runs of tabs and spaces to a single hyphen
+each,
+
address@hidden
+(regexp-substitute/global #f "[ \t]+"  "this   is   the text"
+                          'pre "-" 'post)
address@hidden "this-is-the-text"
address@hidden example
+
+Or using a function to reverse the letters in each word,
+
address@hidden
+(regexp-substitute/global #f "[a-z]+"  "to do and not-do"
+  'pre (lambda (m) (string-reverse (match:substring m))) 'post)
address@hidden "ot od dna ton-od"
address@hidden example
+
+Without the @code{post} symbol, just one regexp match is made.  For
+example the following is the date example from
address@hidden above, without the need for the separate
address@hidden call.
+
address@hidden
+(define date-regex 
+   "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
+(define s "Date 20020429 12am.")
+(regexp-substitute/global #f date-regex s
+                          'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
+
address@hidden "Date 04-29-2002 12am. (20020429)"
address@hidden lisp
address@hidden deffn
+
+
address@hidden Match Structures
address@hidden Match Structures
+
address@hidden match structures
+
+A @dfn{match structure} is the object returned by @code{string-match} and
address@hidden  It describes which portion of a string, if any,
+matched the given regular expression.  Match structures include: a
+reference to the string that was checked for matches; the starting and
+ending positions of the regexp match; and, if the regexp included any
+parenthesized subexpressions, the starting and ending positions of each
+submatch.
+
+In each of the regexp match functions described below, the @code{match}
+argument must be a match structure returned by a previous call to
address@hidden or @code{regexp-exec}.  Most of these functions
+return some information about the original target string that was
+matched against a regular expression; we will call that string
address@hidden for easy reference.
+
address@hidden begin (scm-doc-string "regex.scm" "regexp-match?")
address@hidden {Scheme Procedure} regexp-match? obj
+Return @code{#t} if @var{obj} is a match structure returned by a
+previous call to @code{regexp-exec}, or @code{#f} otherwise.
address@hidden deffn
+
address@hidden begin (scm-doc-string "regex.scm" "match:substring")
address@hidden {Scheme Procedure} match:substring match [n]
+Return the portion of @var{target} matched by subexpression number
address@hidden  Submatch 0 (the default) represents the entire regexp match.
+If the regular expression as a whole matched, but the subexpression
+number @var{n} did not match, return @code{#f}.
address@hidden deffn
+
address@hidden
+(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
+(match:substring s)
address@hidden "2002"
+
+;; match starting at offset 6 in the string
+(match:substring
+  (string-match "[0-9][0-9][0-9][0-9]" "blah987654" 6))
address@hidden "7654"
address@hidden lisp
+
address@hidden begin (scm-doc-string "regex.scm" "match:start")
address@hidden {Scheme Procedure} match:start match [n]
+Return the starting position of submatch number @var{n}.
address@hidden deffn
+
+In the following example, the result is 4, since the match starts at
+character index 4:
+
address@hidden
+(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
+(match:start s)
address@hidden 4
address@hidden lisp
+
address@hidden begin (scm-doc-string "regex.scm" "match:end")
address@hidden {Scheme Procedure} match:end match [n]
+Return the ending position of submatch number @var{n}.
address@hidden deffn
+
+In the following example, the result is 8, since the match runs between
+characters 4 and 8 (i.e. the ``2002'').
+
address@hidden
+(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
+(match:end s)
address@hidden 8
address@hidden lisp
+
address@hidden begin (scm-doc-string "regex.scm" "match:prefix")
address@hidden {Scheme Procedure} match:prefix match
+Return the unmatched portion of @var{target} preceding the regexp match.
+
address@hidden
+(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
+(match:prefix s)
address@hidden "blah"
address@hidden lisp
address@hidden deffn
+
address@hidden begin (scm-doc-string "regex.scm" "match:suffix")
address@hidden {Scheme Procedure} match:suffix match
+Return the unmatched portion of @var{target} following the regexp match.
address@hidden deffn
+
address@hidden
+(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
+(match:suffix s)
address@hidden "foo"
address@hidden lisp
+
address@hidden begin (scm-doc-string "regex.scm" "match:count")
address@hidden {Scheme Procedure} match:count match
+Return the number of parenthesized subexpressions from @var{match}.
+Note that the entire regular expression match itself counts as a
+subexpression, and failed submatches are included in the count.
address@hidden deffn
+
address@hidden begin (scm-doc-string "regex.scm" "match:string")
address@hidden {Scheme Procedure} match:string match
+Return the original @var{target} string.
address@hidden deffn
+
address@hidden
+(define s (string-match "[0-9][0-9][0-9][0-9]" "blah2002foo"))
+(match:string s)
address@hidden "blah2002foo"
address@hidden lisp
+
+
address@hidden Backslash Escapes
address@hidden Backslash Escapes
+
+Sometimes you will want a regexp to match characters like @samp{*} or
address@hidden exactly.  For example, to check whether a particular string
+represents a menu entry from an Info node, it would be useful to match
+it against a regexp like @samp{^* [^:]*::}.  However, this won't work;
+because the asterisk is a metacharacter, it won't match the @samp{*} at
+the beginning of the string.  In this case, we want to make the first
+asterisk un-magic.
+
+You can do this by preceding the metacharacter with a backslash
+character @samp{\}.  (This is also called @dfn{quoting} the
+metacharacter, and is known as a @dfn{backslash escape}.)  When Guile
+sees a backslash in a regular expression, it considers the following
+glyph to be an ordinary character, no matter what special meaning it
+would ordinarily have.  Therefore, we can make the above example work by
+changing the regexp to @samp{^\* [^:]*::}.  The @samp{\*} sequence tells
+the regular expression engine to match only a single asterisk in the
+target string.
+
+Since the backslash is itself a metacharacter, you may force a regexp to
+match a backslash in the target string by preceding the backslash with
+itself.  For example, to find variable references in a @TeX{} program,
+you might want to find occurrences of the string @samp{\let\} followed
+by any number of alphabetic characters.  The regular expression
address@hidden would do this: the double backslashes in the
+regexp each match a single backslash in the target string.
+
address@hidden begin (scm-doc-string "regex.scm" "regexp-quote")
address@hidden {Scheme Procedure} regexp-quote str
+Quote each special character found in @var{str} with a backslash, and
+return the resulting string.
address@hidden deffn
+
address@hidden important:} Using backslash escapes in Guile source code
+(as in Emacs Lisp or C) can be tricky, because the backslash character
+has special meaning for the Guile reader.  For example, if Guile
+encounters the character sequence @samp{\n} in the middle of a string
+while processing Scheme code, it replaces those characters with a
+newline character.  Similarly, the character sequence @samp{\t} is
+replaced by a horizontal tab.  Several of these @dfn{escape sequences}
+are processed by the Guile reader before your code is executed.
+Unrecognized escape sequences are ignored: if the characters @samp{\*}
+appear in a string, they will be translated to the single character
address@hidden
+
+This translation is obviously undesirable for regular expressions, since
+we want to be able to include backslashes in a string in order to
+escape regexp metacharacters.  Therefore, to make sure that a backslash
+is preserved in a string in your Guile program, you must use @emph{two}
+consecutive backslashes:
+
address@hidden
+(define Info-menu-entry-pattern (make-regexp "^\\* [^:]*"))
address@hidden lisp
+
+The string in this example is preprocessed by the Guile reader before
+any code is executed.  The resulting argument to @code{make-regexp} is
+the string @samp{^\* [^:]*}, which is what we really want.
+
+This also means that in order to write a regular expression that matches
+a single backslash character, the regular expression string in the
+source code must include @emph{four} backslashes.  Each consecutive pair
+of backslashes gets translated by the Guile reader to a single
+backslash, and the resulting double-backslash is interpreted by the
+regexp engine as matching a single backslash character.  Hence:
+
address@hidden
+(define tex-variable-pattern (make-regexp "\\\\let\\\\=[A-Za-z]*"))
address@hidden lisp
+
+The reason for the unwieldiness of this syntax is historical.  Both
+regular expression pattern matchers and Unix string processing systems
+have traditionally used backslashes with the special meanings
+described above.  The POSIX regular expression specification and ANSI C
+standard both require these semantics.  Attempting to abandon either
+convention would cause other kinds of compatibility problems, possibly
+more severe ones.  Therefore, without extending the Scheme reader to
+support strings with different quoting conventions (an ungainly and
+confusing extension when implemented in other languages), we must adhere
+to this cumbersome escape syntax.
diff --git a/doc/ref/guile.texi b/doc/ref/guile.texi
index 31f3014..3fbc1d7 100644
--- a/doc/ref/guile.texi
+++ b/doc/ref/guile.texi
@@ -300,6 +300,7 @@ available through both Scheme and C interfaces.
 * Binding Constructs::          Definitions and variable bindings.
 * Control Mechanisms::          Controlling the flow of program execution.
 * Input and Output::            Ports, reading and writing.
+* Regular Expressions::         Pattern matching and substitution.
 * LALR(1) Parsing::             Generating LALR(1) parsers.
 * Read/Load/Eval/Compile::      Reading and evaluating Scheme code.
 * Memory Management::           Memory management and garbage collection.
@@ -327,6 +328,7 @@ available through both Scheme and C interfaces.
 @include api-binding.texi
 @include api-control.texi
 @include api-io.texi
address@hidden api-regex.texi
 @include api-lalr.texi
 @include api-evaluation.texi
 @include api-memory.texi


hooks/post-receive
-- 
GNU Guile



reply via email to

[Prev in Thread] Current Thread [Next in Thread]