[Octave-bug-tracker] [bug #52647] missing function: erase (introduced Ma

octave-bug-tracker
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #52647] missing function: erase (introduced Ma

From:	Rik
Subject:	[Octave-bug-tracker] [bug #52647] missing function: erase (introduced Matlab 2016b)
Date:	Mon, 25 Dec 2017 12:08:53 -0500 (EST)
User-agent:	Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0
Update of bug #52647 (project octave):

                  Status:               Confirmed => Fixed                  
             Open/Closed:                    Open => Closed                 

    _______________________________________________________

Follow-up Comment #12:

I added erase to Octave in this changeset
(http://hg.savannah.gnu.org/hgweb/octave/rev/892f7f096ffb).

Coding for core Octave is very hard because one simultaneously needs to have
code that implements a specific functionality, that is high performance, that
is well documented, and that follows coding guidelines so it can be
maintained.

In order to help Sahil learn about the process I've annotated how I
transformed his original changeset into the one I checked in.

First, there were some functional issues.  The original code didn't accept a
character array for the second input.  Also, because the code used cellstr
there could be changes in trailing whitespace in the first input.

The solution was to use mat2cell(), rather than cellstr(), to convert to a
cell while preserving all strings intact.

Second, was performance.  Generally, for loops are very slow in an interpreted
language.  Also, rather than re-invent the wheel it is best to re-use existing
functions and wrap them with new code so that they can do whatever is
required.

I wrote a small benchmarking script (tst_erase.m) to test some ideas.


N = 3e3;
bm = zeros (N,1);

#str = "the quick brown fox jumps over the lazy dog";
str = {"the quick brown fox jumps";
       "over the lazy dog"};
#matchstr = "the ";
matchstr = {"the ", "lazy "};

y = "";

for i = 1:N
  tic;
#  y = erase (str, matchstr);
#  y = strrep (str, matchstr, "");
#  y = regexprep (str, matchstr, "");

  bm(i) = toc;
endfor

sum (bm)


And in order to avoid being fooled by a single measurement, I actually ran the
test script multiple times using


for j=1:5; tst_erase; bm2(j) = sum (bm); endfor; mean (bm2)


The results are shown below


################################################################################
Case # 1
str = "the quick brown fox jumps over the lazy dog"};
matchstr = "the ";
N = 3e3
############################################################
erase : 0.91 sec
strrep : 0.048 sec
regexprep : .054 sec
newerase: 0.27 sec

################################################################################
Case # 2
str = {"the quick brown fox jumps";
       "over the lazy dog"};
matchstr = "the ";
N = 3e3
############################################################
erase : 1.10 sec
strrep : 0.044 sec
regexprep : .076 sec
newerase: 0.27 sec

################################################################################
Case # 3
str = {"the quick brown fox jumps";
       "over the lazy dog"};
matchstr = {"the ", "lazy "};
N = 3e3
############################################################
erase : 1.07 sec
newerase : 1.38 sec


At first, newerase did not exist, and I was just looking at whether I could
use existing functions like strrep or regexprep to implement the
functionality.  As you can see, strrep is the fastest, but it doesn't handle
multiple patterns in the way erase required.  That's okay, in my new version
of erase I check the number of patterns and if it is 1, which it often will
be, I use strrep for faster performance.  But if it is more than 1 then I fall
back to regexprep because it is still faster than coding your own solution
with a for loop.

The complete new code is


## Copyright (C) 2017 Sahil Yadav
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## Octave is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING.  If not, see
## <http://www.gnu.org/licenses/>.

## -*- texinfo -*-
## @deftypefn  {} address@hidden =} erase (@var{str}, @var{ptn})
## Delete all occurrences of @var{ptn} within @var{str}.
##
## @var{str} and @var{ptn} can be ordinary strings, cell array of strings, or
## character arrays.
##
## Examples
##
## @example
## @group
## ## string, single pattern
## erase ("Hello World!", " World")
##     @result{} "Hello!"
##
## ## cellstr, single pattern
## erase (@{"Hello", "World!"@}, "World")
##     @result{} @{"Hello", "!"@}
##
## ## string, multiple patterns
## erase ("The Octave interpreter is fabulous", @{"interpreter ", "The "@})
##     @result{} "Octave is fabulous"
##
## ## cellstr, multiple patterns
## erase (@{"The ", "Octave interpreter ", "is fabulous"@}, @{"interpreter ",
"The "@})
##     @result{} @{"", "Octave ", "is fabulous"@}
## @end group
## @end example
##
## Programming Note: @code{erase} deletes the first instance of a pattern in
a
## string when there are overlapping occurrences.  For example,
##
## @example
## erase ("abababa", "aba")
##     @result{} "b"
## @end example
##
## See @code{strrep} for processing overlaps.  
## 
## @seealso{strrep, regexprep}
## @end deftypefn

## Author: Sahil Yadav <address@hidden>


function newstr = erase (str, ptn)

  if (nargin != 2)
    print_usage ();
  endif
    
  ischarmatrix = false;
  if (ischar (str)) 
    if (rows (str) > 1)
      ## Convert to cell.  Can't use cellstr which trims spaces.
      str = mat2cell (str, ones (1, rows (str)));
      ischarmatrix = true;
    endif
  elseif (! iscellstr (str))
    error ("erase: STR must be a string or cell array of strings");
  endif

  if (ischar (ptn)) 
    if (rows (ptn) > 1)
      ## Convert to cell.  Can't use cellstr which trims spaces.
      ptn = mat2cell (ptn, ones (1, rows (ptn)));
    endif
  elseif (! iscellstr (ptn))
    error ("erase: PTN must be a string or cell array of strings");
  endif

  nptn = ifelse (ischar (ptn), 1, numel (ptn));
  if (nptn == 1)
    newstr = strrep (str, ptn, "", "overlaps", false);
  else
    ptn = regexptranslate ("escape", ptn(:).');
    ptn = strjoin (ptn, '|');
    newstr = regexprep (str, ptn, "");
  endif

  if (ischarmatrix)
    newstr = char (newstr);
  endif

endfunction


%!assert (erase ("Hello World!", " World"), "Hello!")
%!assert (erase ({"Hello World!"}, " World"), {"Hello!"})
%!assert (erase (char ("Hello", "World!"), "World"), char ("Hello ", "!"))
%!assert (erase ({"Hello", "World!"}, "World"), {"Hello", "!"})
%!assert (erase ("Hello World!", {"o"; "World"; " "}), "Hell!")

## Test overlaps
## from https://savannah.gnu.org/bugs/?52647#comment5
%!assert (erase ('ababa', 'aba'), 'ba')
%!assert (erase ('abababa', 'aba'), 'b')
%!assert (erase ('ababababa', 'aba'), 'bba')
%!assert (erase ('ababababa', {'aba', 'bba'}), 'bba')
%!assert (erase ('ababababa ', {'aba', 'bba'}), 'bba ')
%!assert (erase ({' ababababa '}, {'aba', 'bba'}), {' bba '})
%!assert (erase (' ababa ', {'aba', 'baba'}), ' ba ')
%!assert (erase (' Hello World t ', {'t';'llo'}), ' He World  ')
%!assert (erase ({' Hello World t '}, [ 'o ']), {' HellWorld t '})
%!assert (erase ( 'Hello World t ', {'ld '; 'o '}), 'HellWort ')
%!assert (erase ('aba', 'aba'), '')
%!assert (erase ({'aba'}, 'aba'), ({""}))
%!assert (erase ('', 'aba'), '')
%!assert (erase ({'abbabbabba'},{'abba'}), {'bb'})
%!assert (erase ({'ababababa';'abbabbabba'}, 'aba'), {'bba';'abbabbabba'})
%!assert (erase ({''}, {''}), {''})
%!assert (erase ({'pppppppp'}, 'p'), {''})
%!assert (erase ('Hello World t ', {'ld '; 'o '}), 'HellWort ')
%!assert (erase ({'Hello World t '}, {'ld '; 'o '}), {'HellWort '})

## Test input validation
%!error erase ()
%!error erase ("a")
%!error erase ("a", "b", "c")
%!error <STR must be a string> erase ([1], "foo") 
%!error <PTN must be a string> erase ("foo", [1]) 



First, the GPL license was copied over correctly.  That is good, but the
Copyright date was 2002-2017.  This is a new file so you can only claim
copyright for 2017 and beyond.  I changed that.

I rewrote the documentation to use NEWSTR, STR, and PTN as function arguments.
 These are the same names that are used in strrep().  It is like naming
variables, in that one wants to be as clear as possible.  If the inputs are
just STR1 and STR2 then a user is forced to read the documentation to figure
out what they are.  But with STR and PTN as inputs you can sort of guess what
the function is supposed to do without reading everything.

It is also important for users to know about related functions so I used the
@seealso macro to point towards strrep and regexprep.

I changed the names of the variables used in the function prototype to match
that in the documentation.  This is helpful because even programmers often
reference the documentation to see what is supposed to happen in the function.
 It is confusing when the input in the documentation is STR1 but called
parentstring in the actual code.

%!demo blocks are most useful for plotting functions.  I removed them since I
don't think they help the user here.  I also added %!error tests to check the
input validation.  Finally, I added some more regular %!test blocks.  The
existing BIST tests were focused on corner cases around overlaps, but we
needed some simple general tests of functionality.




(file #42727)
    _______________________________________________________

Additional Item Attachment:

File name: tst_erase.m                    Size:0 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?52647>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/
[Prev in Thread]
Current Thread
[Next in Thread]
[Octave-bug-tracker] [bug #52647] missing function: erase (introduced Matlab 2016b), (continued)
Prev by Date: [Octave-bug-tracker] [bug #52699] ginput() broken on dev
Next by Date: [Octave-bug-tracker] [bug #46835] Error on modifying properties of a legend in plotyy
Previous by thread: [Octave-bug-tracker] [bug #52647] missing function: erase (introduced Matlab 2016b)
Next by thread: [Octave-bug-tracker] [bug #52647] missing function: erase (introduced Matlab 2016b)
Index(es):
- Date
- Thread