[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-developers] Opinions on keyword extraction

From: Blake Matheny
Subject: [GNUnet-developers] Opinions on keyword extraction
Date: Sat, 13 Apr 2002 12:44:11 -0500
User-agent: Mutt/1.3.27i

I started writing some code to do keyword extraction, and just
wanted some opinions on what I'm thinking. With the method that I
propose, you call two functions, getMimeType followed by
extractKeywords in order to get a list of possible keywords for
use. The reason they are called separately, is you may have reason
to just want the mime type, no sense in calling them both if you
don't need to. The method described below also allows for 'easy'
implementation of additional mime detection methods, and simple
addition of new keyword extraction routines. I think it scales, 
and is relatively clean. I think the use of a struct for keywords
also leaves it open to expansion. Please let me know if you have
any ideas on how I can improve this, or if you think I'm full of
it :) Pseudo-code below. Thanks.

/* wrapper that calls actual mime detection methods
 * @param filename
 * @param method of detection (MAGIC, VFS, ETC)
 * @returns mimetype
char * getMimeType(char *fil, int detectmethod);

/* Methods for actually getting mime type, called by getMimeType
 * @param filename
 * @returns mimetype
char * mimeMagic(char *fil);
char * mimeVfs(char *fil);

/* wrapper the determines from mime type which function to call
 * for extracting keywords
 * @param list for keywords to be stored in
 * @param mimetype, as discovered by getMimeType
 * @param filename
 * @returns the number of keywords extracted
int extractKeywords(wordlist *list, char *mimetype, char *filename);

/* Methods for actually getting keywords, called by extractKeywords
 * @param list for keywords to be stored in
 * @param filename
 * @returns number of keywords in list
int extractMp3(wordlist *list, char *filename);
int extractHtml(wordlist *list, char *filename);

typedef struct _wordlist {
    char *word;
    struct _wordlist *next;
} wordlist;

Blake Matheny

reply via email to

[Prev in Thread] Current Thread [Next in Thread]