octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Improving strread / textread / textscan


From: Ben Abbott
Subject: Re: Improving strread / textread / textscan
Date: Sun, 23 Oct 2011 16:53:55 -0400

On Oct 23, 2011, at 3:59 PM, PhilipNienhuis wrote:

> Motivated by this thread
> https://mailman.cae.wisc.edu/pipermail/help-octave/2011-October/048038.html
> I had another look at strread.m. I have 2 questions about it:
> 
> Q1:
> After searching around on the web, I now have inferred the following
> behavior of strread (& textscan) in ML:
> a. "Words" or fields (to be interpreted later) are separated by white-space.
> b. The white-space char set can be adapted by the user with the "whitespace"
> keyword. It can even be set to empty.
> c. White-space is understood to possibly be a vector of white-space chars
> that during reading is folded into one char that separates two fields.
> d. Delimiters are characters that are augmented to white-space (they don't
> replace the white-space char set), but other than white-space, vectors of
> delimiters, or of several delimiters and white-space, are not folded into
> one char that separates fields.
> e. Yet, vectors of white-space and one delimiter are folded into one
> white-space that separates fields.
> f. However, if so desired, multiple consecutive delimiters can be folded
> into one delimiter if "MultipleDelimsAsOne" parameter is set to 1.
> g. EOL char sequences (\n, \r\n, or \r) are also delimiters, but are not
> affected by the MultipleDelimsAsOne parameter.
> (...what a mess...)
> 
> Is there agreement with my interpretation of ML's behaviour?
> 
> Q2: 
> There's ample room for improvement in various parts I wrote. But I need to
> know:
> which one is faster,  strrep  or  regexprep ?
> Both of these are needed in several places, but AFAICS regexprep is more
> versatile.
> Roughly speaking, as strread.m stands now, for each of the points above a
> separate regexprep or strrep run (or series of runs) is needed on the entire
> "file". So it is important to know what functions are the fastest.
> 
> Thanks,
> 
> Philip


We had discussed making some significant changes to these back in 2010.

        
http://octave.1599824.n4.nabble.com/advice-help-needed-for-reading-formatted-text-textscan-strread-amp-textread-tt3009750.html#none

There was another discussion earlier this year.

        
http://octave.1599824.n4.nabble.com/Release-goals-for-3-6-tt3711420.html#none

I'm not sure how much has been done at this point, but reviewing the threads, I 
see John had asked some tests be written. Some of that has been done, but my 
impression is that there are a lot of remaining features of the ML version that 
remain untested.

Would you be interested in cooperating on writing more tests that cover the 
questions you ask above (as well as others)?

Ben





reply via email to

[Prev in Thread] Current Thread [Next in Thread]