octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New importdata function testing


From: Philip Nienhuis
Subject: Re: New importdata function testing
Date: Sun, 21 Oct 2012 14:56:02 -0700 (PDT)

Rik-4 wrote
> 10/20/12
> 
> Erik,
> 
> I did just a small test with importdata and it doesn't seem to work as
> expected.
> 
> For a file, I used import.tst containing
> 
> 1,2,3
> 4,5,6
> 
> And then in Octave, I used
> importdata ('import.tst', ',')
> warning: unrecognized escape sequence '\S' -- converting to 'S'
> ans =
> 
>    NaN   NaN   NaN
>    NaN   NaN   NaN
> 
> The warning doesn't look good and certainly that is not the correct data.
> 
> I am also concerned that the implementation reads the entire file into a
> string and then uses a number of for loops and regexp which will be slow
> in
> Octave.  I did a benchmark with the following:
> 
> x = rand (1e4, 10);
> dlmwrite ('tst.csv', x, ',')
> tic; y = dlmread ('tst.csv', ','); toc
> Elapsed time is 0.209933 seconds.
> tic; y = importdata ('tst2.csv', ','); toc
> Elapsed time is 3.2 seconds.
> 
> I believe it would be faster  to have importdata check the header lines
> only and then pass off the work to dlmread if possible.  dlmread is
> written
> in C++ and, per the benchmarking above, is very fast.  This will work
> whenever there are only header lines and column labels.  When there are
> row
> labels the situation becomes messy, but I think you can still be faster by
> avoiding loops entirely.  One solution would be to split the long string
> returned from fileread into a character array or a cell array and then use
> arrayfun or cellfun with a custom function which removed the row label and
> then used sscanf on the remaining piece of string.
> 
> Overall, I think the function should eventually be in core Octave, but it
> needs more testing and refining.  If that work is going to be done
> immediately then we can keep it where it is.  Otherwise, I would move it
> to
> Octave-Forge until it can graduate to core Octave.
> 
> --Rik

I've already patched a few things (.xls support etc., and a complaint about
"\S" not recognized - see recent thread about "Backslashes in regular
expression replacement patterns",
https://mailman.cae.wisc.edu/pipermail/octave-maintainers/2012-October/030439.html).

As to dlmread, I noted that importdata.m filters out numeric data from a
mixed-type data file; can dlmread do that as well?
If not we might have a look at csv2cell (in the O-F io package) that is
compiled too but currently isn't able to process row and column numbers /
data ranges. Or maybe it can with a wrapper around it, but then, in a
particularly bad case, it might need to read a GB file just to obtain one
float.

I wouldn't mind if importdata.m were moved to the O-F io package but
currently I cannot access the O-F repository at all as Sourceforge has
changed its site setup and my SVN client (TortoiseSVN) cannot get
authentication. So, as far as I'm concerned the default branch is good
enough for importdata.m until further notice :-)

Philip




--
View this message in context: 
http://octave.1599824.n4.nabble.com/New-importdata-function-testing-tp4645570p4645577.html
Sent from the Octave - Maintainers mailing list archive at Nabble.com.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]