octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New importdata function testing


From: Erik Kjellson
Subject: Re: New importdata function testing
Date: Tue, 23 Oct 2012 21:10:31 +0200

Hello,

I have now improved the speed of the import, it's now twice as fast. The changeset is attached, please upload it for me.

I tested this code:
x = rand (1000, 4);
dlmwrite (filepath, '/tmp/test-fil.txt', ',')
tic; y = importdata ('/tmp/test-fil.txt', ','); toc

And it went from 3.18 s to 1.49 s, however dlmread still made it in 0.0083 s.
Anyway a factor 2 in speed is a good start...

The key for the improvement was halving the number of times that num2str was called. So a faster way to convert string into a number seems to be the critical point.

regards,
Erik



On 22 October 2012 17:30, Rik <address@hidden> wrote:
On 10/22/2012 05:51 AM, Jordi Gutiérrez Hermoso wrote:
> On 21 October 2012 12:07, Rik <address@hidden> wrote:
>> 10/20/12
>>
>> Erik,
>>
>> I did just a small test with importdata and it doesn't seem to work as
>> expected.
>>
>> For a file, I used import.tst containing
>>
>> 1,2,3
>> 4,5,6
>>
>> And then in Octave, I used
>> importdata ('import.tst', ',')
>> warning: unrecognized escape sequence '\S' -- converting to 'S'
> Oops, my bad:
>
>      http://hg.savannah.gnu.org/hgweb/octave/rev/9a455cf96dbe#l2.365
After Philip's changes, imports of CSV data now work which is good to see
the m-file is now functional.  But it is even slower than before.  See below
>
>> I am also concerned that the implementation reads the entire file into a
>> string and then uses a number of for loops and regexp which will be slow in
>> Octave.  I did a benchmark with the following:
>>
>> x = rand (1e4, 10);
>> dlmwrite ('tst.csv', x, ',')
>> tic; y = dlmread ('tst.csv', ','); toc
>> Elapsed time is 0.209933 seconds.
>> tic; y = importdata ('tst2.csv', ','); toc
>> Elapsed time is 3.2 seconds.
>>
>> I believe it would be faster  to have importdata check the header lines
>> only and then pass off the work to dlmread if possible.  dlmread is written
>> in C++ and, per the benchmarking above, is very fast.
New benchmarking with functional importdata:
dlmread : 0.240746 seconds
importdata: 110 seconds

Also could you check some of the examples from this link
(http://www.mathworks.com/help/matlab/import_export/import-numeric-data-and-header-text-from-a-text-file.html).
I found the Octave function failed on the complex examples that included
row headers as well as header lines.  In particular,

x = importdata ('grades.dat', ' ')

Thanks,
Rik


Attachment: importdata_improved_speed.diff
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]