[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] Approaches to maintain clinical data uptime

From: Syan Tan
Subject: Re: [Gnumed-devel] Approaches to maintain clinical data uptime
Date: Sun, 30 Apr 2006 14:05:08 +0800

couldn't you file a request for a academic replication system , like a gossip architecture system ?

BTW,  I'm not quite clear about why lamport clocks as opposed to vector clocks are used ;

a lamport clock is just one sequence number for one site, which is kept ordered whenever

sites send messages to each other. Vector clocks are sequence numbers kept at every site about

every site , so when messages are received , changes can be causally ordered between more

than one other site . What sort of ordering is being aimed for the netepi multi-site application and why ?

On Sun Apr 30 9:06 , Tim Churches sent:

James Busser wrote:
> On Apr 29, 2006, at 4:35 AM, Tim Churches wrote:
>> (I keep wondering whether we should have used an EAV pattern for storage
> Educated myself (just a bit) here

Thanks - we have copies of the latter three papers but I hadn't seen the
first article. Of course, PostGreSQL muddies the waters, because the way
it works under the bonnet (hood, engine cover) is rather similar to (but
not identical) to the EAV model - but all that is hidden behind the SQL
interface which is not easy to bypass.

We really wanted to use openEHR when we started in 2003 - openEHR can
been seen as a very sophisticated metadata layer which can be used with
an EAV-like back-end storage schema - but no openEHR storage engines
were available then, and when I asked again earlier this year, there
were still none available (as open source or closed source on a
commercial basis) in a production-ready form.

Anyway, plain old PostgreSQL tables work rather well, and are fast and
reliable for large datasets - but we will need to build our own
replication engine, I now think. What we really need is multi-master DB
replication which can cope with slow and unreliable networks (hence it
has to use asyncrhonous updates, not tightly-coupled synchronous updates
such as multi-phase commits) and with frequent "network partition". If
we are funded to do that, then we'll write it in Python, probably using
a stochastic "epidemic" model for the data propagation algorithm and
some variation on Lamport logical clocks for data synchronisation. It
als needs to propagate schema changes. Hopefully if we can make it
sufficiently general so it might have utility for GNUmed eg when a copy
of a clinic database is taken away on a laptop for use in the field eg
at a nursing home or a satellite clinic, and network connection and
synchronisation only occurs occasionally. However, we need the
replication to scale to 200 to 300 sites. Interestingly, most of the
commercial multi-master database replication products just gloss over
the issue of data integrity, or leave it up to the application - but
research in the 1990s showed that that is not good enough in more
complex situations with more than a few master DB instances.

>> - Slony would have worked with that..).

There is a Slony-2 project, being done here in Sydney, but it is
focussing on multi-master synchronous updates ie multiple servers in a
single data centre, for load-balancing of write tasks as well as read
tasks (for which Slony-1 can be used to facilitate load-balancing)

Sorry to rave on, but don't let anyone tell you that there are some
fundamental data management issues yet to be addressed by open source or
commercial software.

Tim C

Gnumed-devel mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]