pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: K-Means Clustering


From: Mehmet Hakan Satman
Subject: Re: K-Means Clustering
Date: Fri, 11 Mar 2011 00:28:09 -0800 (PST)

Hi friends,

i am trying to implement such a function

int
cmd_quick_cluster (struct lexer *lexer, struct dataset *ds)
{
   const struct dictionary *dict = dataset_dict (ds);
   struct variable *v = dict_get_weight (dict);
   struct variable **variables;
   int n;
   lex_match (lexer, T_SLASH);
   if (!lex_force_match_id (lexer, "VARIABLES")) printf("Variables must be defined");
   lex_match(lexer, T_EQUALS);
   if (!parse_variables_const (lexer, dict, &variables, &n,PV_NUMERIC)) printf("Cannot parse variables");
   printf("Number of variables :%d\n",n);
   return(CMD_SUCCESS);
}

for the K-means clustering. probably, at least the \VARIABLES and the \GROUPS parameters must be implemented in the QUICK CLUSTER command. I learnt how to parse command line, as you see, i can hold the number of variables and the selected variables in a doubled (struct variable**) variable.

Because of the nearly perfect abstraction, i can't reach the data itself. I need to handle the data as doubles. What is the easiest way to grab the raw values from datasets?


From: John Darrington <address@hidden>
To: Mehmet Hakan Satman <address@hidden>
Cc: John Darrington <address@hidden>; address@hidden
Sent: Thu, March 10, 2011 2:07:55 PM
Subject: Re: K-Means Clustering

On Thu, Mar 10, 2011 at 03:35:33AM -0800, Mehmet Hakan Satman wrote:
   
   
    Note: If i am not wrong, "implementing such a command in PSPP" means making the
    k-means clustering codes be callable using the
    "QUICK CLUSTER x y  /MISSING=LISTWISE
      /CRITERIA=CLUSTER(2) MXITER(10) CONVERGE(0)
      /METHOD=KMEANS(NOUPDATE)
      /PRINT INITIAL ANOVA CLUSTER DISTAN."
    syntax. is that right?


That's right.  But I would start with something simpler.  Say just
"QUICK CLUSTER x y z".  The syntax parser can be a bit challenging
so keep it simple to start with, and add the options later.

You'll need to write a function of the form:
int cmd_quick_cluster (struct lexer *lexer, struct dataset *ds);
and register it in the file src/language/command.def

I suggest that you look at some of the existing examples to get
an idea of how it works. reliability.c is probably a good example.

   
    also: I have no experience about coding GTK+

The first step will be to get the command written.  You or someone
else can write the GUI for it later.

Good luck.
   
   
         
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]