[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] re: Soundex

From: Karsten Hilbert
Subject: Re: [Gnumed-devel] re: Soundex
Date: Sun, 22 Aug 2004 19:11:05 +0200
User-agent: Mutt/

> Open sourced Python code for various phonetic indexing functions such as
> Soundex, and for various string similarity comparator functions, such as
> edit (Levenshtein) distance, can be found in the Febrl proababilistic
> record linkage project at
> There are several better functions than Soundex
> available, such as Metaphone or Double Metaphone, NYSIIS or mod_soundex.
> It is often helpful to use more than one phonetic index (eg Double
> metaphone plus NYSIIS, or even these functions on a reversed version of
> the name - to get around the lack of robustness which these phonetic
> indexing functions to initial letter errors)
> However, another alternative is to use a technique we we have dubbed
> "n-gram indexes" (since we developed the method for our record linkage
> project).
> evaluate it for use in GNUmed. It might be overkill for general practice
> databases with a few thousand patients, but the technique is
> conceptually simple and elegant and unlike teh phonetic indexing
> functions, makes no assumptions about name or string morphology and
> phonetics - thus it works equally well with alphabetic names from any
> culture, including Pinying Chinese names. It takes a set-theoretic
> approach, and the faster, built-in set data type in Python 2.4 improves
> its speed considerably.
Because of the obvious thoroughness of the FEBRL approach I
have always considered that to be the search engine I would
use in GnuMed 2.0.

GPG key ID E4071346 @
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

reply via email to

[Prev in Thread] Current Thread [Next in Thread]