[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] docs/match: pattern matcher example makeover

From: Maxime Devos
Subject: Re: [PATCH v3] docs/match: pattern matcher example makeover
Date: Wed, 1 Feb 2023 17:40:23 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0

On 01-02-2023 14:09, Blake Shaw wrote:
style: clean-up newlines
It appears that while the PDF needs additional newlines
to be presentable, these appear to have a negative effect
on the presentation of the texinfo doc.

I don't know how to fix this, but from looking at the PDF,
it appears that the strategy until now has been to privilege
texinfo at the expense of PDF readability (the PDF is more
or less "squished together")

So in that regard, these edits make my past edits more in sync
with past Guile docs.

IIRC, Texinfo has a @iftex @endif construct or such. You could use this to define a @pdf-newline macro, to only insert newlines in the PDF (TeX is used for the PDF).
examples: replace with didactic ex. that can copied & pasted
The existing example can't be copied and pasted.

This example both fixes the past one and improves on its relation
to the text.

style: switch to "Indiana style", bracketing lets and clauses
After spending much time looking at the examples in black & white
to edit the texinfo document, it occurred to me just how much the
brackets improve legibility. Therefore, I have decided to adopt
the "Indiana" style of using brackets, used by Kent Dybvig, Dan
Friedman and Will Byrd at Indiana University.

Currently the docs use this style in some places but not in others.

Considering some are color blind, and that few will have rigged
their texinfo configuration to use rainbow-delimiters in the while
reading documentation, I think this should be considered a general
accessibility improvement.

IME, (( )) is quite readable (and I don't use rainbow delimiters).
That might largely be 'due to experience', though.  While I would
expect ([ ] [ ]) to be unconventional for many Guilers, it should be readable too though, so I suppose it could be good to just change the convention, then.

You are currently making the manual more inconsistent by using this (for Guile) mostly non-standard notation though; IIRC the manual mostly does (( )) and not ([ ]). Yet, in the review of the v1, you mentioned

No, I'm not, I'm being totally boring and normal in this regard because 
collectively authored documentation is something you should never adopt 
non-standard writing notation in the course of authoring, just to one up 
someone on a mailing list

To be honest, it's this kind of attitude that has resulted in the current docs 
that so many people find utterly incomprehensible. The core point of my talk 
that what makes Info Guile so hard to read is the lack of stylistic 
consistency. Editors and editing exist for a very good reason.

, which is very much against non-standard notation and for consistency. As such, I propose:

  a) Before (or after) this patch, change everything in the manual to
     "Indiana style", for consistency.  If you go for 'after this
     patch', I mean immediately afterwards, because Guile contributors
     tend to come and go, and delaying things tends to become never
     doing things.

  b) or: do it in non-Indiana style (likely not the option you will
     take, but it would be more stylistically consistent than the
     current version of the patch ...)

  c) or: don't adjust everything in the manual to Indiana style yet,
     but also make it a rule that the manual (and Guile code in Guile
     proper, I guess) does Indiana style, and that all current
     deviations from Indiana style are old style to be updated in the

     If this were Guix, you could make this a rule by adding it
     to the "Contributing" section.  Guile does not have appear to have
     such a section, but "1.8 Typograhical Conventions" might be a good

Additionally, changing the parenthesis convention in Guile is not just a change to the 'match' documentation, but the subject line only mentions 'match'. While Indiana styles seems a good thing to me at first sight now you mention the benefits, it needs a separate e-mail thread such that people interested in ()/[] stuff but not in 'match' stuff will have an opportunity to respond.

indentation: make consistent according to rule defined below

If a new paragraph opens onto a new topic, it should naturally
indent (i.e, no indentation markup is required)

If a new paragraph is a continuation of the current subject,
the markup @noident should be applied

markup: replace @var with @code unless @var is a @defn argument

The way that it renders in texinfo means that it renders @vars
in uppercase, the way that is conventionally done for definition

I'm not too familiar with Texinfo PDF output but I'll take your word for it. However, this is not the case at least for HTML output, as you can see at <>, for HTML documentation it remains lowercase.

Therefore I've changed all @vars to @code unless @var is a @defn

I'm missing what you mean with the 'Therefore'. How does this relate to your previous paragraph (I don't get what your point is about 'definition arguments')? Do you mean that uppercase @var bad and that it should be lowercase instead? If so, it would be better to modify Texinfo itself to let @var not change the case, then every manual in Texinfo would benefit instead of only the Guile manual.

Also, you could ask the Texinfo people if there is a reason for uppercase @var; maybe they determined that it is more readable to more people (I'm just speculating, I don't know the reason)? -- Presumably there's some good reason (or maybe not, I don't know, but you could ask them first).

Otherwise, if you make this Guile-specific change, you would create stylistical inconsistencies between projects using Texinfo. More specifically, you are creating stylistical inconsisencies between GNU projects.

Additionally, you are not merely removing the uppercasing thing, you are also removing the 'slanted' thing -- the result of @var is slanted typewriter, the result of @code is merely typewriter, which makes it slightly harder to distinguish metavariables from other code.

You are also only making this stylistical change in the documentation of 'match'; the remainder of the manual still has the old @var. If you change tings, it would be better to change things for the whole manual. I think you can do this by redefining the @var macro to whatever you want in the prelude (at least that can be done in TeX).

remove: paragraph that referred to a since removed example

fix: uncomment @xref{sxml-match}
  doc/ref/match.texi | 252 ++++++++++++++++++++++++++++++---------------
  1 file changed, 167 insertions(+), 85 deletions(-)

diff --git a/doc/ref/match.texi b/doc/ref/match.texi
index f5ea43118..4e657b976 100644
--- a/doc/ref/match.texi
+++ b/doc/ref/match.texi
@@ -23,71 +23,142 @@ The @code{(ice-9 match)} module provides a @dfn{pattern 
  written by Alex Shinn, and compatible with Andrew K. Wright's pattern
  matcher found in many Scheme implementations.
-@cindex pattern variable
-A pattern matcher can match an object against several patterns and
-extract the elements that make it up.  Patterns can represent any Scheme
-object: lists, strings, symbols, records, etc.  They can optionally contain
-@dfn{pattern variables}.  When a matching pattern is found, an
-expression associated with the pattern is evaluated, optionally with all
-pattern variables bound to the corresponding elements of the object:
+@noindent A pattern matcher does precisely what the name implies: it
+matches some arbitrary pattern, and returns some result accordingly.

Again, as I mentioned previously, in the general case it matches arbitrary patterns (plural) and returns results (plural) -- the 'match' construct is not as limited as you are implying it to be here.

-(let ((l '(hello (world))))
-  (match l           ;; <- the input object
-    (('hello (who))  ;; <- the pattern
-     who)))          ;; <- the expression evaluated upon matching
-@result{} world
+(define (english-base-ten->number name)
+  (match name
+    ('zero   0)
+    ('one    1)
+    ('two    2)
+    ('three  3)
+    ('four   4)
+    ('five   5)
+    ('six    6)
+    ('seven  7)
+    ('eight  8)
+    ('nine   9)))
+(english-base-ten->number 'six)
+@result{} 6

My previous comment still applies:

This is a suboptimal example; this would be better done with 'case'.
I propose replacing it with another example, or adding a note that one would normally use 'case' for this.

still applies.  What is the reason for not doing something akin to that?

+(apply + (map english-base-ten->number '(one two three four)))
+@result{} 10
  @end example
-In this example, list @var{l} matches the pattern @code{('hello (who))},
-because it is a two-element list whose first element is the symbol
-@code{hello} and whose second element is a one-element list.  Here
-@var{who} is a pattern variable.  @code{match}, the pattern matcher,
-locally binds @var{who} to the value contained in this one-element
-list---i.e., the symbol @code{world}.  An error would be raised if
-@var{l} did not match the pattern.
+@cindex pattern variable
+@noindent Pattern matchers may contain @dfn{pattern variables},
+local bindings to all elements that match a pattern.

'Pattern matchers' -> 'pattern' would be more precise here, as it more precisely states _where_ the pattern variable is. E.g. if you say 'pattern', it's certainly not the 'ns' in (match ns ...). If you say 'pattern matcher' (*), then 'pattern matcher' might mean 'match' itself, or (match ns ...); the former does not contain a pattern variable, the latter likely does but less is stated about _where_ the pattern variable is, purely going by your sentence it moght be the 'match' which is incorrect.

(*) While the original text defined 'pattern matcher=match', that part doesn't contain any pattern variables, and in your new text the notion is of 'pattern matcher' is not exactly defined but rather described, and not as some kind of precise characterisation.

-The same object can be matched against a simpler pattern:
+(let re ([ns '(one two three four 9)] [total 0])

The Scheme convention would to be to write 'loop' instead of 're' when using named-let, and something like 'rest' instead of 'ns'. The exact word for the loop argument varies a lot, but two letters that don't appear to mean anything are to be avoided.

+  (match ns
+    [(e) (+ total (english-base-ten->number e))]
+    [(e . es)
+     (re es (+ total (english-base-ten->number e)))]))

I tried running your example, and it doesn't work:

(define (english-base-ten->number name)
  (match name
    ('zero   0)
    ('one    1)
    ('two    2)
    ('three  3)
    ('four   4)
    ('five   5)
    ('six    6)
    ('seven  7)
    ('eight  8)
    ('nine   9)))
(let re ([ns '(one two three four 9)] [total 0])
  (match ns
    [(e) (+ total (english-base-ten->number e))]
    [(e . es)
     (re es (+ total (english-base-ten->number e)))]))
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `match-error' with args `("match" "no matching pattern" 9)'.

Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.

I think you need to replace (one two three four 9) by (one two three four nine). As you mentioned yourself (in other words), examples in the manual should actually work as-is.

-(let ((l '(hello (world))))
-  (match l
-    ((x y)
-     (values x y))))
-@result{} hello
-@result{} (world)
+@result{} 19
  @end example
-Here pattern @code{(x y)} matches any two-element list, regardless of
-the types of these elements.  Pattern variables @var{x} and @var{y} are
-bound to, respectively, the first and second element of @var{l}.
-Patterns can be composed, and nested.  For instance, @code{...}
+@noindent In this example, the list @code{ns} matches the pattern
+@code{(e . es)}, where the pattern variable @code{e} corresponds
+to the metaphoical "car" of @code{ns} and the pattern variable @code{es}
+corresponds to the "cdr" of @code{ns}.

Typo: metaphoical -> metaphorical.

Also: metaphorical -> literal. -- e is literally the car of ns (or ‘corresponds to the car of ns in a literal way’ if you go for a variable/value distinction); there is nothing figurative here. I would just drop the metaphorical/literal word. Also, "car" -> `car' and "cdr" -> `cdr' -- the manual currently consistently uses the quotation style ‘car’ / ‘pair?’, ‘SCM’, ..., not "car". For example, in 5.4.1 Dynamic Types, there is the paragraph:

In order to implement standard Scheme functions like ‘pair?’ and
‘string?’ and provide garbage collection, the representation of every
value must contain enough information to accurately determine its type
at run time.

'Function' -> 'Procedure'. You are introducing a stylistical inconsistency here. In Guile, the C things are called 'Functions', and the Scheme things are called 'Procedures'. To some degree, this ‘in Scheme it's called a procedure’ also holds for other Schemes IIUC.

Actually, while some GC do require runtime type information (RTI), RTI is not needed for garbage collection. Guix uses Boehm-GC for garbage collection. Being a conservative garbage collector, it doesn't need any type information. It works a little better if you do give it some type information, and Guile does give it some information in some cases, but it's not required.

This information is therefore incorrect and needs to be removed, but the bits about predicates seems fine to me.

Often, Scheme systems also use this information to
determine whether a program has attempted to apply an operation to an
inappropriately typed value (such as taking the ‘car’ of a string).

IIUC, in Texinfo, we write `stuff' instead of ‘stuff’, and it will get turned in ‘stuff’. I dunno why this is still done in the Guile manual as UTF-8 is an established thing, but I have used ‘’ in Guix stuff in the past and people changed into `'.

Additionally, doing "git grep -F "car" doc/ref/*.texi", it appears that the manual doesn't actually quote car and cdr -- instead it writes car and cdr unquoted, or writes @code{car} / @code{cdr} which happens to be turned into a quoted ‘car’ / ‘cdr’ in the .info documentation by Texinfo.

I think you can guess what I would be saying about stylistic consistency here.

+@noindent A tail call @code{re} is then initiated

‘A tail call @code{re} is then initiated’ -> ‘A tail call to @code{re} is the initiated’ -- @code{re} is a variable reference, not a tail call. The tail call is @code{(re es (+ to total ...))}.

More simply, you could write ‘The procedure @var{re} is then tail-called’.

+and we "cdr" down the
+list by recurring on the tail @code{es}, applying our matcher
+@code{english-base-ten->number} to each element of @code{ns} until
+only a single element @code{(e)} remains, causing the @code{total}
+to be computed.  In modern Scheme programming it is common to use
+@code{match} in place of the more verbose but familiar combination
+of @code{cond}, @code{car} and @code{cdr}, so it's important to
+understand how these idioms translate.
+Patterns can be composed and nested.  For instance, @code{...}
  (ellipsis) means that the previous pattern may be matched zero or more
  times in a list:
-(match lst
-  (((heads tails ...) ...)
-   heads))
+(match '((a.0 b.0 c.0 ((1.0 2.0 3.0) x.0 y.0 z.0))
+         (a.1 b.1 c.1 ((1.1 2.1 3.1) x.1 y.1 z.1)))
+  [((heads ... ((tails ...) . rest)) ...)
+   (begin
+    (format #t "heads: ~a ~%" heads)
+    (format #t "tails: ~a ~%" tails)
+    (format #t "rest:  ~a ~%" rest))])
+heads: ((a.0 b.0 c.0) (a.1 b.1 c.1))
+tails: ((1.0 2.0 3.0) (1.1 2.1 3.1))
+rest:  ((x.0 y.0 z.0) (x.1 y.1 z.1))
  @end example
-This expression returns the first element of each list within @var{lst}.
-For proper lists of proper lists, it is equivalent to @code{(map car
-lst)}.  However, it performs additional checks to make sure that
-@var{lst} and the lists therein are proper lists, as prescribed by the
-pattern, raising an error if they are not.
-Compared to hand-written code, pattern matching noticeably improves
-clarity and conciseness---no need to resort to series of @code{car} and
-@code{cdr} calls when matching lists, for instance.  It also improves
-robustness, by making sure the input @emph{completely} matches the
-pattern---conversely, hand-written code often trades robustness for
-conciseness.  And of course, @code{match} is a macro, and the code it
-expands to is just as efficient as equivalent hand-written code.
-The pattern matcher is defined as follows:
+@noindent A pattern matcher can match an object against several
+patterns and extract the elements that make it up.
+(match '((l1 . r1) (l2 . r2) (l3 . r3))
+  [((left . right) ...)
+   (list left right)])
+@result{} ((l1 l2 l3) (r1 r2 r3))
+@end example
+(match '((1 . (a . b)) (2 . (c . d)) (3 . (e . f)))
+  [((key . (left . right)) ...)
+   (fold-right acons '() key right )])
+@result{} ((1 . b) (2 . d) (3 . f))
+@end example
+(match '(((a b c) e f g) 1 2 3)
+  [(((head ...) . rest) tails ...)
+   (acons tails head rest )])
+@result {} (((1 2 3) a b c) e f g)
+@end example
+Patterns can represent any Scheme object: lists, strings, symbols,
+records, etc.
+@noindent When a matching pattern is found, an expression is evaluated
+with pattern variables bound to the corresponding elements of the object.
+(let re ([m #(a "b" c "d" e "f" g)])
+   (match m
+     [(or (e) #(e)) e]
+     [(or #(e1 e2 es ...)
+          (e1 e2 es ...))
+      (cons (cons e1 e2)
+           (re es))]))
+@result{} ((a . "b") (c . "d") (e . "f") . g)
+@end example
+(let re ([m '(a b c d e f g h i)])
+   (match m
+     [(e) e]
+     [(e1 e2 es ...)
+      (acons e1 e2 (re es))]))
+@result{} ((a . b) (c . d) (e . f) (g . h) . i)
+@end example
+@noindent Compared to hand-written code, pattern matching noticeably
+improves clarity and conciseness---no need to resort to series of
+@code{car} and @code{cdr} calls when matching lists, for instance.
+It also improves robustness, by making sure the input @emph{completely}
+matches the pattern---conversely, hand-written code often trades
+robustness for conciseness.  And of course, @code{match} is a macro,
+and the code it expands to is just as efficient as equivalent
+hand-written code.
+@noindent We define @code{match} as follows: @*

Why did you change this from

     The pattern matcher is defined as follows:

? While the 'we' / 'our' / ... construct is pretty convenient, IMO it is better avoided as long as the avoidance doesn't lead to awkward constructions.

  @deffn {Scheme Syntax} match exp clause1 clause2 @dots{}
  Match object @var{exp} against the patterns in @var{clause1}
@@ -96,9 +167,9 @@ value produced by the first matching clause.  If no clause 
  throw an exception with key @code{match-error}.
Each clause has the form @code{(pattern body1 body2 @dots{})}. Each
-@var{pattern} must follow the syntax described below.  Each body is an
+@code{pattern} must follow the syntax described below.  Each body is an
  arbitrary Scheme expression, possibly referring to pattern variables of
  @end deffn
@c FIXME: Document other forms:
@@ -114,7 +185,7 @@ arbitrary Scheme expression, possibly referring to pattern 
variables of
  @c clause ::= (pat body) | (pat => exp)
-The syntax and interpretation of patterns is as follows:
+@noindent @* The pattern language is specified as follows: @*

The stuff below still defines the interpretation, not only the language/grammar. The change 'syntax -> language' seems fine to me, but why remove 'interpretation'?

Additionally, I personally would go for interpretation->semantics, but maybe that's too obscure for a general audience.

> [...]>   @deffn {Scheme Syntax} match-lambda* clause1 clause2 @dots{}
@@ -264,11 +335,10 @@ and can also be used for recursive functions which match 
on their
  arguments as in @code{match-lambda*}.
-(match-let (((x y) (list 1 2))
-            ((a b) (list 3 4)))
-  (list a b x y))
-(3 4 1 2)
+(match-let ([(x y ...) (list 1 2 3)]
+            [(a b ...) (list 3 4 5)])
+  (list x a y b))
+@result{} (1 3 (2 3) (4 5))
  @end example
  @end deffn
@@ -287,22 +357,34 @@ Similar to @code{match-let}, but analogously to @code{let*}, match and
  bind the variables in sequence, with preceding match variables in scope.
-(match-let* (((x y) (list 1 2))
-             ((a b) (list x 4)))
-  (list a b x y))
+(match-let* ([(x . y) (list 1 2 3)]
+             [(a . b) (list x 4 y)])
+  (list a b))

The old example was simpler and still fully demonstrated 'match-let*', why the change (besides [])?

+(define wrap '(((((unnest arbitrary nestings))))))
+(let unwrap ([peel wrap])
+  (match-let* ([([core ...]) peel]
+              [(wrapper ...) core])
+    (if (> (length wrapper) 1)
+       wrapper
+       (unwrap wrapper))))
+@result{} (unnest arbitrary nestings)
+@end example

(Not saying anything about this example TBC.)


Attachment: OpenPGP_0x49E3EE22191725EE.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]