qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Fix Event Viewer errors caused by qemu-ga


From: Sameeh Jubran
Subject: Re: [Qemu-devel] [PATCH] Fix Event Viewer errors caused by qemu-ga
Date: Sun, 2 Apr 2017 10:31:16 +0300

Ping.

On Wed, Mar 22, 2017 at 11:14 AM, Sameeh Jubran <address@hidden> wrote:

>
>
> On Tue, Mar 21, 2017 at 6:09 PM, Michael Roth <address@hidden>
> wrote:
>
>> Quoting Sameeh Jubran (2017-03-21 05:49:52)
>> > When the command "guest-fsfreeze-freeze" is executed it causes
>> > the VSS service to log the errors below in the Event Viewer.
>> >
>> > These errors are caused by two issues in the function "CommitSnapshots"
>> in
>> > provider.cpp:
>> >
>> > 1. When VSS_TIMEOUT_MSEC expires the funtion returns E_ABORT. This
>> causes
>> > the error #12293.
>> >
>> > 2. The VSS_TIMEOUT_MSEC value is too big. According to msdn the
>> > "Flush & Hold" operation has 10 seconds timeout not configurable, The
>> > "CommitSnapshots" is a part of the "Flush & Hold" process and thus any
>> > timeout bigger than 10 seconds would cause the error #12298 and anything
>> > bigger than 40 seconds causes the error #12340. All this info can be
>> found here:
>> > https://msdn.microsoft.com/en-us/library/windows/desktop/aa3
>> 84589(v=vs.85).aspx
>>
>> Not sure how best to deal with this. Technically our CommitSnapshots
>> interface is driven by the backup job being run by QGA/QEMU management
>> side. If that amount of time exceeds the VSS limits then I think it's
>> appropriate for VSS to log the error accordingly. VSS_TIMEOUT_MSEC here
>> doesn't actually have too much correlation with the VSS-set timeout,
>> IIRC it's specifically picked to exceed both the 10 and 40 second
>> timeouts and acts more as a fail-safe timeout.
>
> The timeout was added in #commit: b39297aedfabe9b2c426cd540413be991500da25
> There is no point in setting the TIMEOUT for this long as the actual
> freeze - Fush and Hold Writes -
> is limited to 10 seconds ( not configurable) according to msdn
> https://msdn.microsoft.com/en-us/library/windows/
> desktop/aa384589%28v=vs.85%29.aspx
>
>>
>> Are the event logs causing issues? FWIW, on the posix side we also opt
>> for gratuitous logging to syslog and such, the idea there being that
>> cooperative guests would prefer transparency on how the agent is being
>> used.
>>
> Apparently, these error logs are annoying to some (
> https://bugzilla.redhat.com/show_bug.cgi?id=1387125),
> moreover I don't think that our implementation to the freeze operation -
> which is a workaround in a way -
> should log errors even though we know they are false alarm.
>
>>
>> That said, I do think error 12293 is unecessary, since IIUC it would
>> always be paired with the actual VSS-reported error. So avoiding the
>> E_ABORT seems reasonable either way.
>>
>> >
>> > |event id|                           error
>>  |
>> > * 12293  : Volume Shadow Copy Service error: Error calling a routine on
>> a
>> >            Shadow Copy Provider {00000000-0000-0000-0000-000000000000}.
>> >        Routine details CommitSnapshots [hr = 0x80004004, Operation
>> >        aborted.
>> >
>> > * 12340  : Volume Shadow Copy Error: VSS waited more than 40 seconds for
>> >            all volumes to be flushed.  This caused volume
>> >        \\?\Volume{62a171da-32ec-11e4-80b1-806e6f6e6963}\ to timeout
>> >        while waiting for the release-writes phase of shadow copy
>> >        creation. Trying again when disk activity is lower may solve
>> >        this problem.
>> >
>> > * 12298  : Volume Shadow Copy Service error: The I/O writes cannot be
>> held
>> >            during the shadow copy creation period on volume
>> >            \\?\Volume{62a171d9-32ec-11e4-80b1-806e6f6e6963}\. The
>> volume
>> >        index in the shadow copy set is 0. Error details:
>> >        Open[0x00000000, The operation completed successfully. ],
>> >        Flush[0x00000000, The operation completed successfully.],
>> >        Release[0x00000000, The operation completed successfully.],
>> >        OnRun[0x80042314, The shadow copy provider timed out while
>> >        holding writes to the volume being shadow copied. This is
>> >        probably due to excessive activity on the volume by an
>> >        application or a system service. Try again later when activity
>> >        on the volume is reduced.
>> >
>> > Signed-off-by: Sameeh Jubran <address@hidden>
>> > ---
>> >  qga/vss-win32/provider.cpp | 3 +--
>> >  1 file changed, 1 insertion(+), 2 deletions(-)
>> >
>> > diff --git a/qga/vss-win32/provider.cpp b/qga/vss-win32/provider.cpp
>> > index ef94669..d72f4d4 100644
>> > --- a/qga/vss-win32/provider.cpp
>> > +++ b/qga/vss-win32/provider.cpp
>> > @@ -15,7 +15,7 @@
>> >  #include <inc/win2003/vscoordint.h>
>> >  #include <inc/win2003/vsprov.h>
>> >
>> > -#define VSS_TIMEOUT_MSEC (60*1000)
>> > +#define VSS_TIMEOUT_MSEC (9 * 1000)
>> >
>> >  static long g_nComObjsInUse;
>> >  HINSTANCE g_hinstDll;
>> > @@ -377,7 +377,6 @@ STDMETHODIMP CQGAVssProvider::CommitSnapshots(VSS_ID
>> SnapshotSetId)
>> >      if (WaitForSingleObject(hEventThaw, VSS_TIMEOUT_MSEC) !=
>> WAIT_OBJECT_0) {
>> >          /* Send event to qemu-ga to notify the provider is timed out */
>> >          SetEvent(hEventTimeout);
>> > -        hr = E_ABORT;
>> >      }
>> >
>> >      CloseHandle(hEventThaw);
>> > --
>> > 2.9.3
>> >
>>
>>
>
>
> --
> Respectfully,
> *Sameeh Jubran*
> *Linkedin <https://il.linkedin.com/pub/sameeh-jubran/87/747/a8a>*
> *Software Engineer @ Daynix <http://www.daynix.com>.*
>



-- 
Respectfully,
*Sameeh Jubran*
*Linkedin <https://il.linkedin.com/pub/sameeh-jubran/87/747/a8a>*
*Software Engineer @ Daynix <http://www.daynix.com>.*


reply via email to

[Prev in Thread] Current Thread [Next in Thread]