Re: dataframe dereferencing

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dataframe dereferencing

From:	Jaroslav Hajek
Subject:	Re: dataframe dereferencing
Date:	Tue, 7 Sep 2010 08:10:40 +0200

On Mon, Sep 6, 2010 at 3:29 PM, CdeMills <address@hidden> wrote:
>
> I'm at a conference, so less time to check developments.
>
> Judd and Jaroslav, could you propose syntax for the following operations:

I would choose a more minimalistic design. Let (I,J) indexing of a
dataframe always create a sub-dataframe, and provide methods for
converting a dataframe to a matrix (double, single, uint32 etc) or a
cell (e.g. as_cell(), as_cell_with_headers).
Both I and J could be either numeric indices or strings or cell arrays
of strings.
Also, to access individual elements, overload {I,J} to give the
corresponding element(s) directly.
This way dataframe would work very much like a cell array while being
compressed as a collection of numeric matrices.

> 1) sub-ref, returning a cell array without headers

as_cell (df(I,J))

> 2) sub-ref, returning a cell array with rows and columns header, which can
> be converted back into a dataframe

as_cell_with_headers (df(I,J)) # or create a better name

> 3) sub-ref, returning a dataframe object (by default, if all columns are
> numeric, return a matrix)

df(I,J). I wouldn't return a matrix as a special case. Make that
as_matrix(df(I,J)). Such surprising special cases are a pain in the
ass, generally. What if the user doesn't expect this and his data just
*happens* to be all numeric?

> 4) sub-ref, casting inhomogenous columns to the same type (by default,
> downclass numerical outputs and vertcat() them)
>

uint32 (df) etc. Just overload them; cast(df, "uint32") will then work
automagically.

> The constraint is that the row, column and type conversion data may not be
> separated, otherwise chaining rule is broken, that is assign the
> intermediate value to some variable, and perform the next operation on it.
>

This would be broken, of course. I think, however, that the
dataframe->dataframe indexing can be made quite efficient, so that the
extra penalty would not matter much.

> In the first version, I used df(row_range, column_range, "dataframe") to get
> back a dataframe object. I switched to df(row, column).dataframe, then to
> df.dataframe(rows, columns). So all I have to do is to revive the adequate
> code from the version management system.
>

Of all those, I liked the current approach the best. I even think this
could be co-existing with the indexing + conversion approach. However,
if you decide to go this route, I suggest you make some benchmarks
afterwards and remove the df.as.x indexing if the performance
advantage turns out to be unconvincing.

regards

-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

[Prev in Thread]

Current Thread

[Next in Thread]

Re: dataframe dereferencing, (continued)

Prev by Date: Re: wait_for_file ??
Next by Date: Re: Octave's m4/acx_pthread.m4 & OSX
Previous by thread: Re: dataframe dereferencing
Next by thread: Re: dataframe dereferencing
Index(es):
- Date
- Thread