qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Using cache=writeback safely on qemu 1.4.0 and later


From: Andrew Martin
Subject: Re: [Qemu-devel] Using cache=writeback safely on qemu 1.4.0 and later
Date: Mon, 25 Aug 2014 13:13:09 -0500 (CDT)

----- Original Message -----
> From: "Stefan Hajnoczi" <address@hidden>
> To: "Andrew Martin" <address@hidden>
> Cc: "qemu-devel" <address@hidden>
> Sent: Thursday, August 21, 2014 7:59:50 AM
> Subject: Re: [Qemu-devel] Using cache=writeback safely on qemu 1.4.0 and later
>
> > When the disk is set to cache=writethrough on one of the same VMs, I see
> > frequent
> > fdatasync(2) calls (every few seconds). However, when I change the disk
> > over to
> > cache=writeback, since boot I have not yet seen a single fdatasync(2) call,
> > even
> > after writing data 2x the amount of RAM:
> 
> There is a misconception here.  Writing data to disk does not make it
> persistent across power failure.  The behavior you observed is
> actually expected.
> 
> Flushing the disk cache is a deliberate operation that applications
> must perform to ensure data is safe on disk.
> 
> You can use sync(1) to manually flush file system buffers for testing.
> On the host you should see an fdatasync(2).
> 
I booted a VM into both writeback and none, and in both cases wrote some data to
the disk and then issued sync(1) as root while stracing the QEMU process on the
host. In both cases, I did not see any fdatasync, fsync, or sync syscalls 
passed.

> >> > I recently experienced UPS failure on several hosts which caused a hard
> >> > shutdown. After restarting, 3 of the guests had corruption on their
> >> > disks
> >> > and
> >> > required a fairly long fsck to fix. Afterwards, data that had been
> >> > written
> >> > to
> >> > the disks several hours before the crash was corrupted, which makes me
> >> > think
> >> > that it was never fsync()-ed to the non-volatile storage.
> >>
> >> What exactly was the "corruption" you encountered?  Which application,
> >> error message, etc.
> >
> > Two of the servers are web servers with apache2. In one case, a python
> > daemon
> > copies JPGs onto the server - the last 100 copied onto the server were
> > corrupted.
> > In another case, some files had been uploaded several days prior to the
> > www-root,
> > but after the hard reset said files were no longer present in the
> > filesystem.
> 
> Did the Python daemon fsync the files and directories it modified/created?
> 
> Did you sync(1) after copying files to www-root?
> 
> Also, you didn't explain what "corrupted" means.  Where the jpg files
> missing, were they zero bytes in size, were they filled with junk,
> etc?
> 
The jpgs appeared to be a normal size, but were filled with junk. The files
uploaded by apache2 were missing from the filesystem.

Even if the python daemon or apache2 did not fsync the modified files, isn't 
there some action that the OS takes periodically to flush dirty pages to disk? 
This seems to be implied in the SuSE documentation:
https://www.suse.com/documentation/sles11/book_kvm/data/sect1_1_chapter_book_kvm.html
"the normal page cache management will handle commitment to the storage device."


In the case of the files uploaded by apache2, they were added to the server 
days 
before  the power outage, so it seems like there would have been ample time for 
those changes to have been flushed.

Thanks!

Andrew



reply via email to

[Prev in Thread] Current Thread [Next in Thread]