help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

hist(y,n) [was Re: I NEED HELP ...]


From: Paul Kienzle
Subject: hist(y,n) [was Re: I NEED HELP ...]
Date: Tue, 18 Jun 2002 11:24:41 -0400

On Tue, Jun 18, 2002 at 02:24:59AM -0400, Paul Kienzle wrote:
> On Fri, Jun 07, 2002 at 02:25:17PM -0500, Marco Boni wrote:
<snip>

> For example, in hist:
> 
>       for i = 1:n-1
>               cutoff (i) = (x (i) + x (i + 1)) / 2;
>       endfor
> 
> is simply translated into
> 
>       cutoff = ( x(1:n-1) + x(2:n) ) / 2;
> 
> The next loop in hist.m is more tricky:
> 
>       freq(1) = sum(y < cutoff (1));
>       for i = 2:n-1
>               freq (i) = sum (y >= cutoff (i-1) & y < cutoff (i));
>       endfor
>       freq(n) = sum (y >= cutoff (n-1));
> 
> You can get part way there with sort:
> 
>       [s, idx] = sort ( [cutoff(:); y(:)] );
>       pt = [0; idx > n; 0];
> 
> This gives you a sequence of 0's and 1's where the zeros represent
> bin boundaries and the ones represent bin contents.  Because sort
> is 'stable' all y values which are identical to an x value will
> appear after the x value, so the semi-open [x-1, x) logic will
> be preserved.  The leading and trailing zeros capture all the y's
> which are not otherwise in a bin.
> 
> Using cumulative sum, you can turn the ones into frequency counts
> 
>       chist = cumsum(pt); 
> 
> But you only need the counts at the bin boundaries:
> 
>       chist = s(find(diff(chist) == 0));
> 
> Now you have a 'cumulative histogram'.  Differentiate it and you 
> should have the histogram:
> 
>       freq = diff(chist);

Oops! This will need to be:

        freq = [chist(1); diff(chist) ]

> 
> I may messed up something along the way, but hopefully you get
> the idea.  Could you fix it, test it and submit it to bug-octave?
> 
> Since you are dealing with really large numbers of bins (up to 2^20 if
> I'm reading your code correctly), fixing hist should address most of
> the speed problems.  

I was bothered by having to compute with all those empty bins.  Here is
a version for n equal sized bins which is independent of the number of bins:

   bins = zeros(1,n);
   q = sort(y(:).');
   L = length(q);
   if (L == 0)
      return;
   elseif (q(1) == q(L))
      bins(n) = L;
      return;
   endif

   q = (q - q(1))/(q(L)-q(1))/(1+eps); # set y-range to [0,1)
   q = fix(q*n);                  # split into n bins
   same = [ q == [q(2:L),-Inf] ]; # true if neighbours are in the same bin
   q = q(~same);                  # q lists the 'active' bins (0-origin)
   
   f = cumsum(same);              # cumulative histogram
   f = f(~same);
   f = [f(1), diff(f)] + 1;       # cumulative histogram -> histogram
   # we need to add 1 since we did not count the point at the boundary between
   # histograms (it was turned into zero)

   # distribute f to the active bins, leaving the remaining bins empty
   bins(q+1) = f;


Paul Kienzle
address@hidden



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]