On May 2, 2006, at 7:25 AM, David Bateman wrote:
Paul Kienzle wrote:
David,
octave now goes quickly through the regular expression portion of
the code.
I haven't yet confirmed that the results are consistent with matlab.
The next portion involves for loops such as the following:
tag = cell(number_of_tags,4);
for i=1:number_of_tags
tag{i,1} = xml(tag_start(i):tag_end(i))
end
which for 10000 tags is slow.
Are there octave routines for splitting/joining strings into cells
which are fast?
- Paul
Paul,
Hey, I'm on holidays at the moment, and so have a little time. What
about the attached implementation of mat2cell? With this you should
be able to repalce the above code with
tag = cell(number_of_tags,4);
tag{:,1} = mat2cell (xml, 1, tag_end - tag_start);
mat2cell partitions the matrix into cells. The xml2cell code extracts
substrings.
The following does what I expect:
xml='<eh><bee> <see> deed </see> </bee></eh>';
tag_start = find(xml=='<');
tag_end = find(xml=='>');
pieces = [ tag_start; tag_end+1 ];
partition = diff([1;pieces(:);length(xml)+1]);
tag_name = mat2cell (xml, 1, partition) (2:2:end);
tags = cell(length(tag_start),4);
tags(:,1) = tag_name';
Here are a couple of test cases
/*
%!test
%! x = reshape(1:20,5,4);
%! c = mat2cell(x,[3,2],[3,1]);
%! assert(c,{[1,6,11;2,7,12;3,8,13],[16;17;18];[4,9,14;5,10,15],[19;20]})
%!test
%! x = 'abcdefghij';
%! c = mat2cell(x,1,[0,4,2,0,4,0]);
%! assert(c,{'','abcd','ef','','ghij',''})