[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rationale for split-string?

From: Luc Teirlinck
Subject: Re: Rationale for split-string?
Date: Mon, 21 Apr 2003 16:11:21 -0500 (CDT)

Stephen Turnbull wrote:

   How about:

   ;; one function, three arguments

   (defun split-string (string &optional separators omit-nulls)

     "Splits STRING into substrings bounded by matches for SEPARATORS.

   The beginning and end of STRING, and each match for SEPARATORS, are
   splitting points.  The substrings between the splitting points are
   collected in a list, which is returned.  (The substrings matching
   SEPARATORS are removed.)

   If SEPARATORS is nil, it defaults to \"[ \f\t\n\r\v]+\".

   If OMIT-NULLs is t, zero-length substrings are omitted from the list
   (so that for the default value of SEPARATORS leading and trailing
   whitespace are trimmed).  If nil, all zero-length substrings are
   retained, which correctly parses CSV format, for example."

     ;; implementation

There are two problems with this.  First of, all it would break tons
of existing Emacs code.  Secondly, the defaults for SEPARATORS and for
OMIT-NULLs do not match.  Thus, the most routine call of 
(split-string string) would produce nonsensical results in the case of
leading or trailing whitespace.

Something like

(split-string &optional separators keep-nulls)

that is, the same as your proposal but with the roles of nil and t
reversed would take care of the second objection and also break less
existing Emacs code (but probably still enough to worry about).  Of
course the reduction in broken Emacs code would probably come at the
expense of breaking existing XEmacs code.

With your proposal, we would have to replace plenty of occurrence of
(split-string string) in Emacs with (split-string string nil t).  To
do that automatically, we would have to change all of them.  There is
plenty of Elisp code that is not included in either the Emacs or
XEmacs distributions, but that might still be important to plenty of
people.  We can not change that code.  Code compatible between
different Emacs versions would have to become more complex.  The
reverse version of your proposal would eliminate this part of the
problem, but probably produce a similar problem for XEmacs.  With the
reverse proposal above, we would not have to worry about Emacs calls
to split-string with the default-value for SEPARATORS, but one still
would have to go through all occurrences of split-string with
non-default values of SEPARATORS, at the very least in all .el files
in the Lisp directory and all its subdirectories, and very carefully
check which ones the change would break and fix all those.
(Personally I do not have the time to do that.)  Even if somebody
finds the time to do all of this, we can not check and fix Elisp code
not included in the Emacs or XEmacs distributions.

The point of my proposal (possible values "all","none" and "edges" for
omit-nulls with nil being equivalent with "edges" in Emacs and with
"none" in XEmacs) was to avoid breaking any existing Emacs or XEmacs
code while still making it trivial to use split-string in a way that
works identically in Emacs and XEmacs.  Again, in that proposal, only
"edges" as an additional value for omit-nulls is necessary to avoid
breaking existing Emacs code.  I only mentioned "beginning" and "end"
as luxury possibilities.  I know of software packages that use the
"end" version and the "end" version actually does make a lot of sense
in plenty of situations, like splitting a file or buffer into lines,
where a leading newline does represent an empty line, but a trailing
one does not represent an additional empty line following it.  The
"end" (as well as the "beginning") behavior is, however, trivial to
obtain from the "none" behavior, so that it would be a luxury.  ("end"
would be a nice luxury, "beginning" would probably be a "luxury
luxury" for symmetry with "end".)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]