[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] Approaches to maintain clinical data uptime

From: Tim Churches
Subject: Re: [Gnumed-devel] Approaches to maintain clinical data uptime
Date: Thu, 04 May 2006 03:20:50 +1000
User-agent: Thunderbird (Windows/20060308)

Karsten Hilbert wrote:
> On Wed, May 03, 2006 at 01:04:08AM -0700, Jim Busser wrote:
>> - can Syan or Karsten, if the work involved is limited, attempt a  
>> load test of writing the dump file, so we might know how long this  
>> would require and what its size might be...
> This is entirely dependant on the database content. The dump
> file may grow huge, indeed (several GB, eventually) but
> could be fed into compression right away or streamed onto
> tape or whatever.

One of our PG dtabases has a few million rows in the main table and
occupies about 10GB of disc space. The resulting PG dump file, in
compressed binary format, is about 600MB and takes about 20 minutes to
write to disc. However, a test restore, albeit on a 2.4 GHz P4 laptop
with 1GB of memory and a typically slow laptop disc, may take 4 or 5 hours.

>> this is pertinent to  
>> office procedures and downtime
> No it's not. That's what PG is MVCC for - and that's why we
> *chose* PG in the first place.

In other words, PG allows a backup dump to be safely taken of a live
database, which is in continuous use, by noting the state of the
database at the time the dump is started, and only including versions of
records as they were at that time. It can do that because it is able to
keep track of multiple multiple versions of records at once (hence the
acronym Multiple Version Concurrency Control).

Thus, if:

a) pg_dump starts at 22:15:00.00 hrs

b) records for patient Joe Bloggs are updated by GNUmed user 5 seconds
later at  22:15:05.00 hrs

c) pg_dump finishes at 22:31:56.78 hrs

then the pg_dump will contain the records for Joe Bloggs as they were
when the pg_dump started i.e. **before** the update  at b). However
anyone accessing the database at, say, 22:16:00.00 hrs - i.e. 1 minute
later, will see the records for Joe Bloggs which include updates done in
b). Because a db_dump is running, PG knows it has to keep both versions
of Joe Bloggs' records around, at least until the db_dump is completed.

I think that only Oracle, MS SQL Server and PG offer MVCC and hence true
hot backups. MVCC also allows multiple transactions to proceed in
parallel, which helps throughput in normal use, not just when backups
are taken. Other databases have to lock either all or large parts of the
database during the back-up dump, which can indeed interfere with users
of the database and office workflow as you point out.

Tim C

reply via email to

[Prev in Thread] Current Thread [Next in Thread]