[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Standard example datasets

From: Carnë Draug
Subject: Re: Standard example datasets
Date: Thu, 2 May 2019 15:38:21 +0100

On Thu, 2 May 2019 at 01:22, Andrew Janke <address@hidden> wrote:
> On 4/28/19 8:27 AM, Carnë Draug wrote:
> > On Sat, 27 Apr 2019 at 21:02, Andrew Janke <address@hidden> wrote:
> >>
> >> Hi, Octave maintainers,
> >>
> >> Some other statistical programs ship with standard example datasets and
> >> methods to load or explore them. Does Octave have something like this?
> >>
> >> For example, R ships with a bunch of example datasets in its "datasets"
> >> package, and you can view a list of them by doing `data()`. And Matlab
> >> ships with a bazillion example datasets that seem to all be just MAT
> >> files in its source code root directories, that you can access with
> >> load, like `load patients`.
> >>
> >> Use case: I'm working on table stuff, and would like to add some example
> >> tabular datasets in my package. Wondering if there's a standard
> >> mechanism I should integrate with.
> >>
> >
> > Matlab also comes with such datasets.  Ideally we would have the same
> > so that examples that use them work in Octave as well.  It would also
> > simplify some test cases which require generation of input data (I
> > would arguee that would actually enable them because if generation of
> > such complex datasets is too complicated then there's no tests for
> > them).
> >
> > Anyway, there is already an item on the tracker [1] that lists the
> > ones in Matlab.  The issue is finding who is the copyright holder of
> > such data and contact them.
> >
> > [1] https://savannah.gnu.org/patch/?9544
> >
> Do we have any lawyers or software licensing experts on the list?
> My understanding is that simple databases are not subject to copyright,
> under the "you can't copyright facts" principle. They're just subject to
> whatever licensing terms you signed a contract to get access to the data
> under.

None of us are lawyers.  Some people will argue that datasets are
copyrightable.  There's a bunch of scientists struggling with the
whole thing about sharing data, and licenses for data are a real
thing.  Also, some of those datasets are images and photographs
including of paintings.

I think discussing this is outside the scope of Octave.

> I'm looking through the R source code. R's example datasets are mostly
> little datasets written out in source code like this:
> [...]
> Could we just take the numbers from the R code, either under the "no
> copyright for dbs" rule, or under the same license that R itself is
> distributed under, rewrite it as M-code, and include those?

The whole point I tried to made before was that it would be more
useful to have the same datasets as Matlab because it makes easier to
copy paste examples .  If you copy the datasets of R, then you will no
longer copy paste such example code into Octave at which point you
might as well make up your own datasets and side step the whole
copyright question.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]