pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: K-Means Clustering


From: Mehmet Hakan Satman
Subject: Re: K-Means Clustering
Date: Mon, 14 Mar 2011 12:22:36 -0700 (PDT)

Hi John,

1) I renamed the file as "quick-cluster.c"

2. I added an entry to  "src/language/stats/automake.mk" for quick-cluster

3. I removed the entry "UNIMPL_CMD ("QUICK CLUSTER", "Fast clustering")" from command.def file.

4. Now cmd_quick_cluster can parse a command line like:

QUICK CLUSTER x y z
      /CRITERIA = CLUSTER(5) MXITER (100).

5. I removed the atoi() function, put    
     lex_force_int (lexer);
    groups = lex_integer (lexer);
instead.

As I mentioned, i test my results with random data with uniform distributed random values. It can not be considered as a comprehensive work and should be tested with simulations.

Regards.


Mehmet Hakan Satman
http://www.mhsatman.com


--- On Sun, 3/13/11, John Darrington <address@hidden> wrote:

From: John Darrington <address@hidden>
Subject: Re: K-Means Clustering
To: "Mehmet Hakan Satman" <address@hidden>
Cc: "John Darrington" <address@hidden>, address@hidden
Date: Sunday, March 13, 2011, 4:36 PM

Hi Mehmet,

Thanks for this.  It seems to be basically working.  There are a number of improvements
that can be made however.

1. It'll be more consistent with the rest of PSPP if you call the new file "quick-cluster.c"   with a hyphen.

2. Instead of editing the Makefile, add the name of the new file to the manifest in
   src/language/stats/automake.mk

3. Can you remove the line UNIMPL_CMD ("QUICK CLUSTER", "Fast clustering")



Now the "quick cluster" command can parse these options in the pspp command line:
     
     quick cluster /VARIABLES=x y z /GROUPS=5 /MAXITER=100.

This is different to the syntax in the SPSS documentation which expects:

   QUICK CLUSTER x y z
      /CRITERIA = CLUSTER(5) MXITER (100).

where the /CRITERIA subcommand and each part thereof is optional.  You can see an example of how to
implement a /CRITERIA subcommand in src/language/stats/factor.c - in fact, you
may be able to copy much of that parser's code.

Avoid using atoi in the parser.  Instead of    groups=atoi(lex_tokcstr(lexer));
write :
    lex_force_int (lexer);
    groups = lex_integer (lexer);
     

  i think a small development pdf documentation does not satisfies the needs of implementing
  something in PSPP.

You're  right.  The developer documentation is woefully incomplete.

You mentioned earlier that you had tested the results against spss. Do you have the results
from these tests, and the test data that you used?  I would be interested to see this.
     

Best regards,

John

--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.


Attachment: quick-cluster.c
Description: Text Data

Attachment: command.def
Description: Binary data

Attachment: automake.mk
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]