qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Using cache=writeback safely on qemu 1.4.0 and later


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Using cache=writeback safely on qemu 1.4.0 and later
Date: Thu, 28 Aug 2014 11:22:09 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

On Mon, Aug 25, 2014 at 01:13:09PM -0500, Andrew Martin wrote:
> > >> > I recently experienced UPS failure on several hosts which caused a hard
> > >> > shutdown. After restarting, 3 of the guests had corruption on their
> > >> > disks
> > >> > and
> > >> > required a fairly long fsck to fix. Afterwards, data that had been
> > >> > written
> > >> > to
> > >> > the disks several hours before the crash was corrupted, which makes me
> > >> > think
> > >> > that it was never fsync()-ed to the non-volatile storage.
> > >>
> > >> What exactly was the "corruption" you encountered?  Which application,
> > >> error message, etc.
> > >
> > > Two of the servers are web servers with apache2. In one case, a python
> > > daemon
> > > copies JPGs onto the server - the last 100 copied onto the server were
> > > corrupted.
> > > In another case, some files had been uploaded several days prior to the
> > > www-root,
> > > but after the hard reset said files were no longer present in the
> > > filesystem.
> > 
> > Did the Python daemon fsync the files and directories it modified/created?
> > 
> > Did you sync(1) after copying files to www-root?
> > 
> > Also, you didn't explain what "corrupted" means.  Where the jpg files
> > missing, were they zero bytes in size, were they filled with junk,
> > etc?
> > 
> The jpgs appeared to be a normal size, but were filled with junk. The files
> uploaded by apache2 were missing from the filesystem.
> 
> Even if the python daemon or apache2 did not fsync the modified files, isn't 
> there some action that the OS takes periodically to flush dirty pages to 
> disk? 
> This seems to be implied in the SuSE documentation:
> https://www.suse.com/documentation/sles11/book_kvm/data/sect1_1_chapter_book_kvm.html
> "the normal page cache management will handle commitment to the storage 
> device."
> 
> 
> In the case of the files uploaded by apache2, they were added to the server 
> days 
> before  the power outage, so it seems like there would have been ample time 
> for 
> those changes to have been flushed.

In the general case of copying/creating some files and hoping that they
will be persistent, it usually works.  If you want to be 100% sure you
still need to flush the cache explicitly.

It doesn't work when updates are made to data on disk and the ordering
matters (e.g. wrong ordering could corrupt data or cause it to be lost).
In that case relying on the kernel to flush dirty buffers periodically
is not a feasible approach because you don't know when the will happen
and therefore have no control over ordering.

Stefan

Attachment: pgpwz6sYTHw8p.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]