octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New strsplit function


From: John W. Eaton
Subject: Re: New strsplit function
Date: Thu, 16 May 2013 02:39:03 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.11) Gecko/20121122 Icedove/10.0.11

On 05/16/2013 02:19 AM, Ben Abbott wrote:

hmmm ... I took a look at Matlab 2013a.  It's not clear to me that we'd want to 
copy this.


Well, Matlab users apparently want compatibility here.  That's why I
received the report.

matlab>  strsplit('', 'a')

ans =

     {''}

matlab>  strsplit('a', 'a')

ans =

     ''    ''

matlab>  strsplit('aa', 'a')

ans =

     ''    ''

matlab>  strsplit('aaa', 'a')

ans =

     ''    ''

matlab>  strsplit('aaaa', 'a')

ans =

     ''    ''
matlab>  strsplit ('abc', {'a','b','c'})

ans =

     ''    ''
In case it isn't clear, the output is a cellstring containing two empty strings.

Oh, so collapsdelimiters means that if multiple consecutive delimiters
appear in the string that is being split, they should be treated as
one?

Then I think my guess about what was happening was wrong, and the
behavior above is correct.  If the string is 'aa' and the delimiter is
'a', then it is the same as strsplit ('a', 'a') and the result should
be two empty strings (one for before and one for after the
delimiter).  That's the result we used to get for the simpler case of
strsplit ('a', 'a').  Now we get an empty cell array, which looks
wrong to me.

So in this code

    ## Get substring lengths.
    if (isempty (idx))
      strlens = length (str);
    else
      strlens = [idx(1)-1, diff(idx)-1, numel(str)-idx(end)];
    endif
    if (nargout > 1)
      ## Grab the separators
      matches = num2cell (str(idx)(:)).';
      if (args.collapsedelimiters)
        ## Collapse the consequtive delimiters
        ## TODO - is there a vectorized way?
        for m = numel(matches):-1:2
          if (strlens(m) == 0)
            matches{m-1} = [matches{m-1:m}];
            matches(m) = [];
          endif
        end
      endif
    endif
    ## Remove separators.
    str(idx) = [];
    if (args.collapsedelimiters)
      ## Omit zero lengths.
      strlens = strlens(strlens != 0);
    endif

    ## Convert!
    result = mat2cell (str, 1, strlens);

it seems like we should be performing the "omit zero lengths" part on
the output of diff, then tacking on the beginning and ending strings.
But I don't understand what the "if (nargout > 1)" part in between is
doing.

jwe


reply via email to

[Prev in Thread] Current Thread [Next in Thread]