octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Improving strread / textread / textscan


From: Philip Nienhuis
Subject: Re: Improving strread / textread / textscan
Date: Sun, 23 Oct 2011 23:20:53 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Ben Abbott wrote:

On Oct 23, 2011, at 3:59 PM, PhilipNienhuis wrote:

Motivated by this thread
https://mailman.cae.wisc.edu/pipermail/help-octave/2011-October/048038.html
I had another look at strread.m. I have 2 questions about it:

Q1:
After searching around on the web, I now have inferred the following
behavior of strread (&  textscan) in ML:
a. "Words" or fields (to be interpreted later) are separated by white-space.
b. The white-space char set can be adapted by the user with the "whitespace"
keyword. It can even be set to empty.
c. White-space is understood to possibly be a vector of white-space chars
that during reading is folded into one char that separates two fields.
d. Delimiters are characters that are augmented to white-space (they don't
replace the white-space char set), but other than white-space, vectors of
delimiters, or of several delimiters and white-space, are not folded into
one char that separates fields.
e. Yet, vectors of white-space and one delimiter are folded into one
white-space that separates fields.
f. However, if so desired, multiple consecutive delimiters can be folded
into one delimiter if "MultipleDelimsAsOne" parameter is set to 1.
g. EOL char sequences (\n, \r\n, or \r) are also delimiters, but are not
affected by the MultipleDelimsAsOne parameter.
(...what a mess...)

Is there agreement with my interpretation of ML's behaviour?

Q2:
There's ample room for improvement in various parts I wrote. But I need to
know:
which one is faster,  strrep  or  regexprep ?
Both of these are needed in several places, but AFAICS regexprep is more
versatile.
Roughly speaking, as strread.m stands now, for each of the points above a
separate regexprep or strrep run (or series of runs) is needed on the entire
"file". So it is important to know what functions are the fastest.

Thanks,

Philip


We had discussed making some significant changes to these back in 2010.

        
http://octave.1599824.n4.nabble.com/advice-help-needed-for-reading-formatted-text-textscan-strread-amp-textread-tt3009750.html#none

There was another discussion earlier this year.

        
http://octave.1599824.n4.nabble.com/Release-goals-for-3-6-tt3711420.html#none

I know both of these threads, and I participated quite a bit into the second one.

I'm not sure how much has been done at this point, but reviewing the threads, I 
see John had asked some tests be written. Some of that has been done, but my 
impression is that there are a lot of remaining features of the ML version that 
remain untested.

Well, I already more than doubled the number of tests for strread, textread and textscan inj the course of fixing them. Of course, given ML's undocumented behavior, the number of test might really need to be quadrupled ... :-)

But serious, I think currently there are adequate tests for most if not all functionality currently built into Octave's text reading functions. It is the odd corner cases that lack tests, but these usually only come up in the help-octave list & bug tracker.

There is some ML functionality not yet ported to Octave (double quotes, %f32 %i64, etc.) but that will probably only come if jwe ever finishes his textscan.oct (he started earlier this year with that).
Tests for those are not too urgent right now.

Would you be interested in cooperating on writing more tests that cover the 
questions you ask above (as well as others)?

What really needs to be done now is writing tests for ML, to pinpoint its behavior, rather than adding tests to Octave.

Once again: do you think my assessment of ML's strread/textscan behavior in my original posting would be acceptable?

Philip


reply via email to

[Prev in Thread] Current Thread [Next in Thread]