Re: dataframe dereferencing

On Mon, Sep 6, 2010 at 9:29 AM, CdeMills <address@hidden> wrote:

could you propose syntax for the following operations:

So I'm going to repeat a lot of what I understand to be the issues in order to make sure that I'm not confused. Fundamentally, @dataframe should behave simultaneously as if it were a matrix and a cell array. Sub-referencing a matrix returns the a matrix and sub-referencing a cell array either returns a cell array or a specific element.

## Matrix-like behavior

df(1:3,1:3) # -> matrix

df(5,4) # content of element 5,4 which happens also to be a matrix

## Cell-like behavior

df{1:3,1:3} # -> cell array

df{5,4} # -> content of cell 5,4 (not cell array)

One challenge is how to sub-reference to get a sub-dataframe since the behavior of () and {} is already prescribed.

1) sub-ref, returning a cell array without headers

df{1:3,1:3}

2) sub-ref, returning a cell array with rows and columns header, which can
be converted back into a dataframe

df{1:3,1:3,"headers"}

3) sub-ref, returning a dataframe object (by default, if all columns are
numeric, return a matrix)

df(1:3,1:3) # if all columns are numeric return matrix

df(1:3,1:3,"dataframe")

df{1:3,1:3,"dataframe")

4) sub-ref, casting inhomogenous columns to the same type (by default,
downclass numerical outputs and vertcat() them)

df(1:3,1:3,"double")

df(1:3,1:3,"uint32")

df(1:3,1:3,"single")

The constraint is that the row, column and type conversion data may not be
separated, otherwise chaining rule is broken, that is assign the
intermediate value to some variable, and perform the next operation on it.

--judd

A different approach to the sub-referencing problem would be to try using multiple output arguments for subsref. Jaroslav has added some pretty nice things to the development sources that would allow this be pretty efficient. e.g.:

[a,b] = df(1:3,1:3) ;

[c,d] = df{1:3,1:3} ;

where "a" and "c" would contain the usual output of () and {} dereferencing (e.g. matrix and cell arrays or single elements) and "b" and "d" would contain sub-referenced dataframe objects. The development version of octave allows this approach to be very efficient. Firstly, if the second arguments are not provided, things would behave as normal:

a = df(1:3,1:3) ;

c = df{1:3,1:3} ;

Secondly, the development version's ignored output parameters mean dummy placeholder variables aren't needed to get at the sub-referenced dataframes:

[~,b] = df(1:3,1:3) ;

[~,d] = df{1:3,1:3} ;

Moreover, Jaroslav added a new isargout() function which allows functions to detect ignored output parameters so there would be no performance penalty.

The challenge is that I don't think ignored parameters work at all on the stack (or if that could even be possible)--e.g. I don't know how to rewrite:

[~,b] = df(1:3,1:3) ;

b = myfunc(b) ;

as a single line.

From:	Judd Storrs
Subject:	Re: dataframe dereferencing
Date:	Mon, 6 Sep 2010 13:06:39 -0400