[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regex-matched formatted input, or selecting a few columns in a CSV f

From: Muthiah Annamalai
Subject: Re: regex-matched formatted input, or selecting a few columns in a CSV file?
Date: Thu, 17 Jan 2008 09:55:39 -0600

On Jan 16, 2008 8:31 PM,  <address@hidden> wrote:
> > I am trying to read from a CSV file, where all the numerical
> > data is on the 4th and 5th column (1-3rd columns contain
> > additional information about instruments). Is there a way to
> > read it directly into Octave using something similar to
> > fscanf?
> One approach is to read in the whole thing and let regexp
> sort it out the mess.  For example,
> fid = fopen('testinput.txt');
> in = fscanf(fid,'%c');
> fclose(fid);
> lines = regexp(in,'(^|\n)([^\n]+)', 'match')
> The variable lines is now a cellarray, the nth element of
> which is a string containing the nth line; returns
> characters (\n) are included at the beginning of each line.
> Now you can use cellfun to split these things up further.
> For example,
> lines2 = cellfun(@(x) regexp(x, '(^|,)[^,]*', 'match'), ...
>   lines, 'UniformOutput', false);
> These regular expressions certainly need to be modified
> a bit, but it might get you close.
> Mark McClure
I was wondering if we could / someone should contribute a way to solve
this 'loading-file' problem.

One idea I was thinkign was to use a chain-of-command? pattern,
where user can invoke a loaddata() with a series of function handlers,
and we try to parse each line of the CSV file, and based on it either
succeeds or fails we can go on to the next handler.

something like:

## varargin is expected to be a series of handlers.
function data = load_data( file, varargin )
     ## open a file etc.
     while (!feof( fp ) )
              op_flag = false;
              for idx = 1:length(varargin)
                     func = varargin{idx};
                     [new_row,op_flag] = func(fp);
                     if ( op_flag ), break;
              if ( !op_flag ) error("data could not be read with handlers");
              data = [data; new_row];

I mean this could also work, if we also suggest some ways of describing CSV
files so we just have to send in various types of rows expected.
 It could be a small-project. Ideally we must be able to describe each-row.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]