[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] experiments with gnumed - multiusers vnc, importing a

From: Syan Tan
Subject: Re: [Gnumed-devel] experiments with gnumed - multiusers vnc, importing an au emr
Date: Tue, 25 Apr 2006 09:37:49 +0800

i've processed  360,000 rows of clin.clin_narrative and parsed out all the words

containing letters. I was thinking of using a stoplist method where any word appearing

on the stoplist will be replaced by 'xxxx' . The stoplist would also include all the names

listed out from dem.names.lastnames and dem.names.firstnames.

BTW - what about a secondary structure for clin.clin_narrative, where the narrative

consists of a list of indexes pointing into a table of words. this is the simplest step before

having some sort of semantic linking at the word level ( but not at the phrase level).

whilst trying to recreate the gnumed database using a pg_dump,

the dump reload seems to stall ; I tried to turn off logging,  table constraints, removing

internal log table data , and fsync , which all finally worked , but I'm not sure what causes the stall.

On Mon Apr 24 18:53 , Karsten Hilbert sent:

On Thu, Apr 20, 2006 at 09:47:54AM +0800, Syan Tan wrote:

> thinking about it, the only correct thing to do seems to be to preserve the
> structure of the instance data and the health issue + episode headings, but to
> scramble the text with word substitution, as well as name substitution, date
> fudging, and address random relinking . would that be de-identified enough ?
Well, I tend to think that "de-identified enough" is a range
from "acceptably so" to "beyond use" rather than a cutoff.
The exact value used within that range depends on what sort
of protection you need.

Yes, if you want to hide a patient's data securely from your
fellow doctor next door you will have to scamble the medical
content, too, as she might be able to match "real patient"
to "problems/operations listed" by her own medical skills
and thereby gain knowledge via the now re-identified EMR.

But if you want to protect a patient's privacy from, say,
me, it's enough to falsify the identities. I do not have
access to your patients. I also have no idea how to find out
who your patients actually are in order to start matching
EMRs to patients. Hence proper protection is ensure, I dare
say. It is akin to not storing patient names with any
medical data and hold the EMR ID <-> patient identity
mapping elsewhere in a secure space (say, the patient's

In a recent discussion on the openhealth list this topic was
chanced upon and the OpenEHR guys thought the latter
approach would be the most secure that's practically useful
- and they were talking real live patient data in actual

> BTW a bzip2 openssl blowfish encrypted pg_dump is about 21M.
Fine. If public/private encryption of the entire thing is
too large then do so with a large and very random one-time
session key which you sent to me and symmetrically encrypt
the data with that. I have no problem downloading 100 MB
from somewhere.

GPG key ID E4071346 @
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

Gnumed-devel mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]