pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #47139] EXAMINE plot histogram does wrong binning


From: Friedrich Beckmann
Subject: Re: [bug #47139] EXAMINE plot histogram does wrong binning
Date: Tue, 16 Feb 2016 10:09:19 +0100

Hi John,

i tested your example on git version 19e56b3221e1008ad4 and see the following plot:


Which seems to me o.k. 

If you count in your example all bins, is the sum 10? That is the first problem. That
should be fixed with the mentioned commit. 

In these tests the case values are very close or at the bin limits. So it could be that
a case goes in the one or the other bin due to numerical rounding when computing
the bin limits.

See my comments to your fix below


Am 15.02.2016 um 21:15 schrieb John Darrington <address@hidden>:

On Mon, Feb 15, 2016 at 06:52:03PM +0000, Friedrich Beckmann wrote:

    I fixed this problem with commit

    http://git.savannah.gnu.org/cgit/pspp.git/commit/?id=ca4012bcf0f8790ceb8539b55bbc296d0802d5d7

    Now all cases are considered in the histogram.

I don't think this is the right fix.

There will still be a problem in the case where max == adjusted_max

For example:

data list list /x *.
begin data.
1
2
3
4
5
6
7
8
9
10
end data.

examine x
/plot = histogram.

The last bin has 3 items and thus distorts the histogram.


I was going to suggest a fix like this:

From 8e381363c45e8be168d742bcdf2debf17c690ba4 Mon Sep 17 00:00:00 2001
From: John Darrington <address@hidden>
Date: Mon, 15 Feb 2016 21:05:09 +0100
Subject: [PATCH] Fix for missing bin

---
src/math/histogram.c |    6 ++++++
1 file changed, 6 insertions(+)

diff --git a/src/math/histogram.c b/src/math/histogram.c
index 9158590..c69006b 100644
--- a/src/math/histogram.c
+++ b/src/math/histogram.c
@@ -143,6 +143,12 @@ histogram_create (double bin_width_in, double min, double max)

  h = xmalloc (sizeof *h);

+  if (adjusted_max >= max)
+    {
+      adjusted_max += (adjusted_max - adjusted_min) / bins;
+      bins++;
+    }
+
  h->gsl_hist = gsl_histogram_alloc (bins);

This fix would always add a bin because adjusted_max should always be bigger or
equal to max. But maybe it is an idea to make sure that adjusted_max is always > max.
Then the gsl_histogram binning would always consider the cases where value is max.


  gsl_histogram_set_ranges_uniform (h->gsl_hist, adjusted_min, adjusted_max);



Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail


reply via email to

[Prev in Thread] Current Thread [Next in Thread]