[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Standard example datasets

From: Andrew Janke
Subject: Re: Standard example datasets
Date: Wed, 1 May 2019 20:22:10 -0400
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 4/28/19 8:27 AM, Carnë Draug wrote:
> On Sat, 27 Apr 2019 at 21:02, Andrew Janke <address@hidden> wrote:
>> Hi, Octave maintainers,
>> Some other statistical programs ship with standard example datasets and
>> methods to load or explore them. Does Octave have something like this?
>> For example, R ships with a bunch of example datasets in its "datasets"
>> package, and you can view a list of them by doing `data()`. And Matlab
>> ships with a bazillion example datasets that seem to all be just MAT
>> files in its source code root directories, that you can access with
>> load, like `load patients`.
>> Use case: I'm working on table stuff, and would like to add some example
>> tabular datasets in my package. Wondering if there's a standard
>> mechanism I should integrate with.
> Matlab also comes with such datasets.  Ideally we would have the same
> so that examples that use them work in Octave as well.  It would also
> simplify some test cases which require generation of input data (I
> would arguee that would actually enable them because if generation of
> such complex datasets is too complicated then there's no tests for
> them).
> Anyway, there is already an item on the tracker [1] that lists the
> ones in Matlab.  The issue is finding who is the copyright holder of
> such data and contact them.
> [1] https://savannah.gnu.org/patch/?9544

Do we have any lawyers or software licensing experts on the list?

My understanding is that simple databases are not subject to copyright,
under the "you can't copyright facts" principle. They're just subject to
whatever licensing terms you signed a contract to get access to the data

I'm looking through the R source code. R's example datasets are mostly
little datasets written out in source code like this:

"VADeaths" <-
structure(c(11.7, 18.1, 26.9, 41, 66, 8.7, 11.7, 20.3, 30.9, 54.3, 15.4,
24.3, 37, 54.6, 71.1, 8.4, 13.6, 19.3, 35.1, 50), .Dim = c(5, 4),
.Dimnames = list(c("50-54", "55-59", "60-64", "65-69", "70-74"),
c("Rural Male", "Rural Female", "Urban Male", "Urban Female")))

Could we just take the numbers from the R code, either under the "no
copyright for dbs" rule, or under the same license that R itself is
distributed under, rewrite it as M-code, and include those?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]