[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] type of search pattern for demographics

From: Karsten Hilbert
Subject: Re: [Gnumed-devel] type of search pattern for demographics
Date: Mon, 23 Aug 2004 20:50:35 +0200
User-agent: Mutt/

Sorry, I am too terse at times.

> At 9:58 PM +0200 8/22/04, Karsten Hilbert wrote:
> > > For the case of apostrophe
> >I would just replace them with a "." and do a regex search.
> Does "replace" mean that it is in one's *search criteria*
> that you suggest a "." be input and what does this do / how
> does it work?
I would replace either of ", ' and ` in the search term behind
the scenes:

user types >O"Hare<
search term becomes >O("|'|`)+Hare<

this is fed to a regular expression search (which is invoked
by * (case sens) or *~ (case insens) in PostgreSQL).

This would mean "find any O followed by either of "'` followed
by Hare".

> >Same with spaces, I suppose. Problem with spaces is that we
> >don't know whether they separate truly separate name parts,
> >eg. first from lastname, or just some admonition.
> ... although do we not input/store these in separate fields?
We do store  first and last in different fields. We don't
store admonitions (?term) in separate fields. The problem is
if the user types "de Groot" how is the machine to know
whether "de" is the first name or just a particle. Yes, this
can be partially solved by a lookup on known particles.

> > > umlaut and... the french accents acute, grave and circumflex
> >Those we take care of already.
> Is it regex that achieves this?
Yep. They are mapped behind the scenes, eg.:

input: müller
search term: m(ü|ue|u)+ller

> Is the current state of gnumed
> (when one inputs into the search box) a regex search?

> > > There may be others but maybe it is not necessary to catalog every
> >> possible occurrence?
> >It would make for a nice regression test database ...
> Meaning we should retain the examples from this thread and
> have people add to them?
Absolutely !!

In fact, your thoughts on how agency tend to force their wrong
spelling of names on people brought me to include, say ü -> u
mappings ...

> >[current state]
> >> - disregard case (iLike?)
> >No. *~
> Does this mean "for now, one must type *~ "
No. It is used as one of the SQL operator to compare search
terms to names in the database.

> and what does this do?
It matches case insensitive.

> > > - disregard accents (plain letters would need to be substituted for
> >> accented ones)
> >yes
> regex?

> > > - disregard spaces (drop them; would cover  two and three part surnames)
> >not done yet, but not the correct approach for many-part
> >names, rather for van/van der/de etc.
> >
> >will do
> currently supported or will do in future?
will as in will be doing as in Future Tense

> needs road-mapping?

> > > - disregard punctuation (e.g. apostrophes and hyphens)
> >will do
> ditto?

> > > We could in addition optionally offer soundex searching per the url
> >which unfortunately only works for English names
> but still useful?

> > if it would be faster, at the time of creating or editing a
> > > patient name, for Gnumed to store a converted form of the names for
> >> search purposes.
> >
> >> Surely it is faster to search a key or index file
> >> than it is to have to locate on the physical hard drive every name in
> >> order to evaluate? If the user were permitted to see how the name is
> >> proposed to be converted and stored for search purposes, people might
> >> wish to edit it to capture the misspelling used by official agencies
> >> or insurers in dealing with the patients (although if this
> >For that I'd actually even suggest adding a flag
> >incorrect-but-legal to arbitrary name rows.
> is a decision required on the above and/or needs it go on a to-do list?
For both items ("normalized search name storage" and
"incorrect-but-legally-required flag") input would be helpful.
They are on my TODO.

Here is the current state of affairs re

def __normalize_soundalikes():

        # umlauts
        normalized =    aString.replace(u'Ä', u'(Ä|AE|Ae|A|E)')
        normalized = normalized.replace(u'Ö', u'(Ö|OE|Oe|O)')
        normalized = normalized.replace(u'Ü', u'(Ü|UE|Ue|U)')
        normalized = normalized.replace(u'ä', u'(ä|ae|e|a)')
        normalized = normalized.replace(u'ö', u'(ö|oe|o)')
        normalized = normalized.replace(u'ü', u'(ü|ue|u|y|i)')
        normalized = normalized.replace(u'ß', u'(ß|sz|ss|s)')
        # common soundalikes
        # - René, Desiré, ...
        normalized = normalized.replace(u'é', u'(é|e)')
        # FIXME: how to sanely replace t -> th ?
        normalized = normalized.replace('Th', '(Th|T)')
        normalized = normalized.replace('th', '(th|t)')
        # FIXME: how to prevent replacing (f|v|ph) -> (f|(v|f|ph)|ph) ?
        #normalized = normalized.replace('v', '(v|f|ph)')
        #normalized = normalized.replace('f', '(f|v|ph)')
        #normalized = normalized.replace('ph', '(ph|f|v)')

GPG key ID E4071346 @
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

reply via email to

[Prev in Thread] Current Thread [Next in Thread]