[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Import large field-delimited file with strings and numbers

From: João Rodrigues
Subject: Import large field-delimited file with strings and numbers
Date: Sat, 06 Sep 2014 15:19:23 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

I need to import a large CSV file with multiple columns with mixed string and number entries, such as:

field1, field2, field3, field4
A,        a,        1,       1.0,
B,        b,        2,        2.0,
C,        c,        3,        3.0,

and I want to pass this on to something like

cell1 ={[1,1] = A; [2,1] = B; [3,1] = C};
cell2 ={[1,1] = a; [2,1] = b; [3,1] = c};
arr3 =[1 2 3]';
arr4 =[1.0 2.0 3.0]';

furthermore, some columns can be ignored, the total number of entries is known and there is a header.

How can I perform the import within reasonable time and little memory overhead? Below are a few of my attempts.

Octave offers a wide range of functions to import files (csvread, dlmread, textscan, textread, fscanf, fgetline) but as far as I can tell none seems to get the job done.

csvread and dlmread don't work because they only handle numerical data.

textscan works eats up all the memory (the file is 200 MB, textscan's memory usage was into the GB's). It doesn't allow to provide a priori the size of the object.

fid = fopen(fstr,"r");
[tmp] = textscan(fid,'%s  %s %d %d','delimiter', ',', 'headerlines', 1);

fgetline allow to define the size of the object a priori but requires a loop:

v = cell(nrow,4);
fid = fopen(fstr,"r");
tmp = fgetl(fid);
for irow = 1 : nrow
    tmp = fgetl(fid);
    v(irow,:) = strsplit(tmp,",");

Any suggestions? (I browsed google and the only suggestion I got was using fgetl, but this is too slow. It takes 30sec to read 1% of the full dataset).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]