[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Principal Components Analysis

From: Gordon Haverland
Subject: Principal Components Analysis
Date: Wed, 27 Jan 1999 14:04:45 -0700


  Maybe someone here can help me.  One of my users wants to
draw ellipses around the centroids of her clusters of data
points from Principal Components Analysis.  I can force
everything to work as I expect, but I don't understand
some of the why of what I am doing.  I have been reading
Numerical Recipes and Matrix Computation (3rd edition).

  So, PCA does an SVD of the data to find eigenvalues
and eigenvectors of the data set and we ignore 
eigenvalues (and associated vectors) with values 
less than 1.  And the combination of eigenvalue and
eigenvector defines a hyper-ellipsoid.  The eigenvalues are
equal to the square root of the variances in the rotated
coordinate system.

  The above is more or less definitions of PCA.  When I go to
generate the ellipse(s), it turns out that I have to use the
square root of the eigenvalues in order to get ellipses of
the correct order of magnitude.  This I don't understand.
Next, the ellipse for the vector x with 2-norm of 1 appears
to contain far more than 68% of the data points.  This may be
due to the few points lying outside the ellipse being quite far
outside, but is still puzzling.  Last, if I want to plot
ellipses of 75%, 90%, 95%, ..., what factors do I either
multiply the eigenvalues (square roots of the eigenvalues)
by (or the vector x)?

  The end result of this should be a octave script or function
which will take the data and do a 2D plot of the 2 most
significant components, along with the ellipse that goes
along with the data points.  I'll gladly donate said script
to this archive.

  Thanks for any light you might shed on this.

Gordon Haverland

reply via email to

[Prev in Thread] Current Thread [Next in Thread]