[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Principal Components Analysis
From: 
John W. Eaton 
Subject: 
Principal Components Analysis 
Date: 
Wed, 27 Jan 1999 15:31:02 0600 (CST) 
On 27Jan1999, Gordon Haverland <address@hidden> wrote:
 Maybe someone here can help me. One of my users wants to
 draw ellipses around the centroids of her clusters of data
 points from Principal Components Analysis. I can force
 everything to work as I expect, but I don't understand
 some of the why of what I am doing. I have been reading
 Numerical Recipes and Matrix Computation (3rd edition).

 So, PCA does an SVD of the data to find eigenvalues
 and eigenvectors of the data set and we ignore
 eigenvalues (and associated vectors) with values
 less than 1. And the combination of eigenvalue and
 eigenvector defines a hyperellipsoid. The eigenvalues are
 equal to the square root of the variances in the rotated
 coordinate system.

 The above is more or less definitions of PCA. When I go to
 generate the ellipse(s), it turns out that I have to use the
 square root of the eigenvalues in order to get ellipses of
 the correct order of magnitude. This I don't understand.
 Next, the ellipse for the vector x with 2norm of 1 appears
 to contain far more than 68% of the data points. This may be
 due to the few points lying outside the ellipse being quite far
 outside, but is still puzzling. Last, if I want to plot
 ellipses of 75%, 90%, 95%, ..., what factors do I either
 multiply the eigenvalues (square roots of the eigenvalues)
 by (or the vector x)?

 The end result of this should be a octave script or function
 which will take the data and do a 2D plot of the 2 most
 significant components, along with the ellipse that goes
 along with the data points. I'll gladly donate said script
 to this archive.
Perhaps the following code will be of some help. It will take a 2x2
matrix and generate a set of points for plotting an ellipse, its major
and minor axes, and the bounding box that just encloses the ellipse.
The 2x2 matrix is one that defines a quadratic form:
[ x1 x2 ] [ a11 a12 ] [ x1 ]
[ a21 a22 ] [ x2 ]
= a11*x1^2 + (a12+a21)*x1*x2 + a22*x2^2
The level is the value of the this expression along the contour.
jwe
## Plotting an ellipse with an aspect ratio not equal to one can be
## mighty confusing.
gset size ratio 1;
## [x, y, major, minor, bbox] = ellipse (amat, level, n)
##
## Given a 2x2 matrix, generate ellipse data for plotting.
function [x, y, major, minor, bbox] = ellipse (amat, level, n)
if (nargin < 3)
n = 100;
endif
if (nargin > 1)
[v, l] = eig (amat / level);
dl = diag(l);
if (any (imag (dl))  any (dl <= 0))
error ("ellipse: amat must be positive definite");
endif
## Generate contour data.
a = 1 / sqrt (l(1,1));
b = 1 / sqrt (l(2,2));
t = linspace (0, 2*pi, n)';
xt = a * cos (t);
yt = b * sin (t);
## Rotate the contours.
ra = atan2 (v(2,1), v(1,1));
cos_ra = cos (ra);
sin_ra = sin (ra);
x = xt * cos_ra  yt * sin_ra;
y = xt * sin_ra + yt * cos_ra;
## Endpoints of the major and minor axes.
major = minor = (v * diag ([a, b]))';
major (2,:) = major (1,:);
minor (1,:) = minor (2,:);
## Bounding box for the ellipse using magic formula.
ainv = inv (amat);
xbox = sqrt (level * ainv(1,1));
ybox = sqrt (level * ainv(2,2));
bbox = [xbox ybox; xbox ybox; xbox ybox; xbox ybox; xbox ybox];
else
usage ("ellipse (amat, level, n)");
endif
endfunction
t = rand (100,2);
amat = t' * t;
level = 100;
npts = 181;
[x, y, major, minor, bbox] = ellipse (amat, level, npts);
gset nokey
plot (minor(:,1), minor(:,2), "address@hidden", major(:,1),
major(:,2),"address@hidden",
bbox(:,1), bbox(:,2), "g", x, y, "b");