[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Thu, 01 Nov 2012 14:07:14 +0100
Excerpts from franklin's message of Thu Nov 01 09:37:21 +0100 2012:
> I am trying to read a csv file with about 25M data points. (250,000
> rows, 100 columns). Using MacBook Pro OSX 10.8 with 8GB RAM, 2.3Ghz
> i7. I used csvread('file'). The process has been running for 1.5 days.
> It's currently using 2.5GB ram. The file is 230MB.
Assuming it's all numbers and you are reading them as double,
you need at least 8 bytes for number. If you have 25M numbers, that's
about 200M bytes. Actually, that's what I get:
octave:1> A = zeros(250000,100);
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
A 250000x100 200000000 double
Total is 25000000 elements using 200000000 bytes
I obtain the same if I do a rand matrix instead of zeros.
> This seems too slow. I didn't make a matrix of zeros before running
> the process. Also, now I have about 1GB of RAM left.
> Can someone give me insight into what's happening? If I interrupt the
> process will it keep the information that is already loaded? Or will I
> lose everything? Should I start quiting other processes to free up ram?
If you have not preallocated the space that is needed to store the 25M
numbers, then the process is going to take forever.
If the file is regular (I guess if it is CSV, it should be), you could
try reading the file at low level, line by line, using fscanf, and
assigning every line to a row in a preallocated matrix of zeros.
This way you don't need to keep the whole text file in memory (I don't
know if csvread does it, just guessing).