|Subject:||Re: [Gnumed-devel] experiments with gnumed - multiusers vnc, importing an au emr|
|Date:||Tue, 25 Apr 2006 09:37:49 +0800|
i've processed 360,000 rows of clin.clin_narrative and parsed out all the words
containing letters. I was thinking of using a stoplist method where any word appearing
on the stoplist will be replaced by 'xxxx' . The stoplist would also include all the names
listed out from dem.names.lastnames and dem.names.firstnames.
BTW - what about a secondary structure for clin.clin_narrative, where the narrative
consists of a list of indexes pointing into a table of words. this is the simplest step before
having some sort of semantic linking at the word level ( but not at the phrase level).
whilst trying to recreate the gnumed database using a pg_dump,
the dump reload seems to stall ; I tried to turn off logging, table constraints, removing
internal log table data , and fsync , which all finally worked , but I'm not sure what causes the stall.
On Mon Apr 24 18:53 , Karsten Hilbert sent:
On Thu, Apr 20, 2006 at 09:47:54AM +0800, Syan Tan wrote:
> thinking about it, the only correct thing to do seems to be to preserve the
> structure of the instance data and the health issue + episode headings, but to
> scramble the text with word substitution, as well as name substitution, date
> fudging, and address random relinking . would that be de-identified enough ?
Well, I tend to think that "de-identified enough" is a range
from "acceptably so" to "beyond use" rather than a cutoff.
The exact value used within that range depends on what sort
of protection you need.
Yes, if you want to hide a patient's data securely from your
fellow doctor next door you will have to scamble the medical
content, too, as she might be able to match "real patient"
to "problems/operations listed" by her own medical skills
and thereby gain knowledge via the now re-identified EMR.
But if you want to protect a patient's privacy from, say,
me, it's enough to falsify the identities. I do not have
access to your patients. I also have no idea how to find out
who your patients actually are in order to start matching
EMRs to patients. Hence proper protection is ensure, I dare
say. It is akin to not storing patient names with any
medical data and hold the EMR ID <-> patient identity
mapping elsewhere in a secure space (say, the patient's
In a recent discussion on the openhealth list this topic was
chanced upon and the OpenEHR guys thought the latter
approach would be the most secure that's practically useful
- and they were talking real live patient data in actual
> BTW a bzip2 openssl blowfish encrypted pg_dump is about 21M.
Fine. If public/private encryption of the entire thing is
too large then do so with a large and very random one-time
session key which you sent to me and symmetrically encrypt
the data with that. I have no problem downloading 100 MB
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Gnumed-devel mailing list
|[Prev in Thread]||Current Thread||[Next in Thread]|