[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How do you select only specific rows based on the values in a specif

From: Joao Rodrigues
Subject: Re: How do you select only specific rows based on the values in a specific column?
Date: Sun, 26 Oct 2014 09:40:27 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 10/26/2014 02:29 AM, Thompson, Robert M - (rmt1) wrote:
I have a huge source file of a million lines, like: (cartographic data)

     0.015625   89.996094    0.018000
     0.046875   89.996094    0.018000
     0.078125   89.996094    0.018000

I was using C to pare the source file down into a smaller file based on values 
in first and second column.

The evaluation was like, e.g., keep this row if column 1 is greater than 0.20000 
and column 2 is >= 89.00000.
But now I want to cut out the C middleman and import the million-line source 
file directly into Octave.

But also select only the rows with first or second columns matching criteria, 
before I consume great amounts of memory on records I will not be using.
I maybe mistaken, but I don't think what you want to do is possible: either you import the whole lot and then let Octave parse the content, which will be fast but you have to import everything. Or you import one row at a time C-style (fopen, fscanf, fclose) and test it, which has no memory overload but is very slow.

If what you have is a million rows, I would go for the first option. C-style reading is only worth it if the file is small. Octave has many import functions, each suitable to particular context. If what you have is a file that only has numerical data and is in ascii then I would first try

a = dlmread(XYZ);

If it takes a lot of time, then try breaking the original file into chunks and import each at a time or other import functions (check the io package, I found out that csv2cell was amazingly fast).

After a is loaded into octave, then use Doug's suggestion to truncate the desired rows.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]