[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: dataframe dereferencing
From: |
Jaroslav Hajek |
Subject: |
Re: dataframe dereferencing |
Date: |
Tue, 7 Sep 2010 08:10:40 +0200 |
On Mon, Sep 6, 2010 at 3:29 PM, CdeMills <address@hidden> wrote:
>
> I'm at a conference, so less time to check developments.
>
> Judd and Jaroslav, could you propose syntax for the following operations:
I would choose a more minimalistic design. Let (I,J) indexing of a
dataframe always create a sub-dataframe, and provide methods for
converting a dataframe to a matrix (double, single, uint32 etc) or a
cell (e.g. as_cell(), as_cell_with_headers).
Both I and J could be either numeric indices or strings or cell arrays
of strings.
Also, to access individual elements, overload {I,J} to give the
corresponding element(s) directly.
This way dataframe would work very much like a cell array while being
compressed as a collection of numeric matrices.
> 1) sub-ref, returning a cell array without headers
as_cell (df(I,J))
> 2) sub-ref, returning a cell array with rows and columns header, which can
> be converted back into a dataframe
as_cell_with_headers (df(I,J)) # or create a better name
> 3) sub-ref, returning a dataframe object (by default, if all columns are
> numeric, return a matrix)
df(I,J). I wouldn't return a matrix as a special case. Make that
as_matrix(df(I,J)). Such surprising special cases are a pain in the
ass, generally. What if the user doesn't expect this and his data just
*happens* to be all numeric?
> 4) sub-ref, casting inhomogenous columns to the same type (by default,
> downclass numerical outputs and vertcat() them)
>
uint32 (df) etc. Just overload them; cast(df, "uint32") will then work
automagically.
> The constraint is that the row, column and type conversion data may not be
> separated, otherwise chaining rule is broken, that is assign the
> intermediate value to some variable, and perform the next operation on it.
>
This would be broken, of course. I think, however, that the
dataframe->dataframe indexing can be made quite efficient, so that the
extra penalty would not matter much.
> In the first version, I used df(row_range, column_range, "dataframe") to get
> back a dataframe object. I switched to df(row, column).dataframe, then to
> df.dataframe(rows, columns). So all I have to do is to revive the adequate
> code from the version management system.
>
Of all those, I liked the current approach the best. I even think this
could be co-existing with the indexing + conversion approach. However,
if you decide to go this route, I suggest you make some benchmarks
afterwards and remove the df.as.x indexing if the performance
advantage turns out to be unconvincing.
regards
--
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
- Re: dataframe dereferencing, (continued)
- Re: dataframe dereferencing, Judd Storrs, 2010/09/03
- Re: dataframe dereferencing, Judd Storrs, 2010/09/03
- Re: dataframe dereferencing, Jaroslav Hajek, 2010/09/03
- Re: dataframe dereferencing, Judd Storrs, 2010/09/03
- Re: dataframe dereferencing, Jaroslav Hajek, 2010/09/04
- Re: dataframe dereferencing, Judd Storrs, 2010/09/04
- Re: dataframe dereferencing, Jaroslav Hajek, 2010/09/04
- Re: dataframe dereferencing, CdeMills, 2010/09/06
- Re: dataframe dereferencing, Judd Storrs, 2010/09/06
- Re: dataframe dereferencing, Judd Storrs, 2010/09/06
- Re: dataframe dereferencing,
Jaroslav Hajek <=
- Re: dataframe dereferencing, Judd Storrs, 2010/09/07
- Re: dataframe dereferencing, CdeMills, 2010/09/07
- Re: dataframe dereferencing, Judd Storrs, 2010/09/07
- Re: dataframe dereferencing, Jaroslav Hajek, 2010/09/08
- Re: dataframe dereferencing, Jaroslav Hajek, 2010/09/08
- Re: dataframe dereferencing, Judd Storrs, 2010/09/08
- Re: dataframe dereferencing, CdeMills, 2010/09/13