[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp-split for Guile

From: Chris K. Jester-Young
Subject: Re: regexp-split for Guile
Date: Sat, 20 Oct 2012 00:01:26 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Fri, Oct 12, 2012 at 05:57:11PM -0400, Mark H Weaver wrote:
> FWIW, I agree with Daniel.  I dislike the complicated semantics of this
> 'limit' argument, which combines into a single number two different
> concepts:

First, I want to thank both Daniel and Mark for their feedback. I'm
sorry I haven't had a chance to reply until now; last weekend I went
to (and presented at) RacketCon, so I didn't have a lot of time for
replying to emails.

(And if you want to see my RacketCon presentation, feel free to visit :-))

> Beyond matters of taste, I don't like this because it makes bugs less
> likely to be caught.  Suppose 'limit' is a computed value, normally
> expected to be positive.  Code that follows may implicitly assume that
> the returned list has no more than 'limit' elements.  Now suppose that
> due to a bug or exceptional circumstance, the computed 'limit' ends up
> being less than 1.  Now 'regexp-split' switches to a qualitatively
> different mode of behavior.

I am sympathetic to this. It would definitely be good for the limit to
mean only that, and not have two other meanings attached to it.

So, in this spirit, below is my proposal for something that I hope would
fit within the character of your feedback, while not making the common
use cases needlessly verbose: we should favour the common use cases by
making them easy to use.

Before I begin, remember that in Perl's split, the default limit is 0,
which is to strip off all the blank trailing fields. This is the common
use case when using whitespace as a delimiter, where you simply want to
ignore all the end-of-line whitespace. Making the calling code manually
call drop-right-while is counter-productive for this common use case.

Here is my proposal:

    (regexp-split pat str #:key limit (trim? (not limit)))

With no optional arguments specified (so, #:limit is #f and #:trim? is
#t), it behaves like limit == 0 in Perl. i.e., return all fields, minus
blank trailing ones.

With a #:limit specified (which must be a positive integer), return
that number of fields at most (subsequent ones are not split out, and
are returned as part of the last field, with all delimiters intact).

With #:trim? given a false value, return all fields, including blank
trailing ones. This is false by default iff #:limit is specified.

Rationale: The common use case is the most succinct version. The next
most common use case has a relatively short formulation (#:trim?).
Also, the default for #:trim? is based on common use cases depending on
whether #:limit is specified. (Trim-with-limit is not supported in Perl,
but it seemed to take more work to ban it here than just let it be.)


    (regexp-split " +" "foo  bar  baz  ")
      => ("foo" "bar" "baz")
    (regexp-split " +" "foo  bar  baz  " #:trim? #f)
      => ("foo" "bar" "baz" "")
    (regexp-split " +" "foo  bar  baz  " #:limit 4)
      => ("foo" "bar" "baz" "")
    (regexp-split " +" "foo  bar  baz  " #:limit 4 #:trim? #t)
      => ("foo" "bar" "baz")
    (regexp-split " +" "foo  bar  baz  " #:limit 3)
      => ("foo" "bar" "baz  ")
    (regexp-split " +" "foo  bar  baz  " #:limit 2)
      => ("foo" "bar  baz  ")

Does that sound reasonable?

Comments welcome,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]