qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: issuing [block-]job-complete to jobs in STANDBY state


From: Vladimir Sementsov-Ogievskiy
Subject: Re: issuing [block-]job-complete to jobs in STANDBY state
Date: Sat, 3 Apr 2021 10:55:47 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0

01.04.2021 22:02, John Snow wrote:
Hi; downstream we've run into an issue where VMs under heavy load with many 
simultaneously concurrent block jobs running might occasionally flicker into 
the STANDBY state, during which time they will be unable to receive JOB 
COMPLETE commands. I assume this flicker is due to child_job_drained_begin().

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1945635

It's safe to just retry this operation again, but it may be difficult to 
understand WHY the job is paused at the application level, since the flush 
event may be asynchronous and unpredictable.

We could define a transition to allow COMPLETE to be applied to STANDBY jobs, 
but is there any risk or drawback to doing so? On QMP's side, we do know the 
difference between a temporary pause and a user pause/error pause (Both use the 
user_pause flag.)

I imagine it's safe to continue rejecting COMPLETE commands if user_paused is set ("No, go fix 
this first!") and we could define a pathway for implicitly STANDBY jobs only. However, in this 
case, we don't really know how long STANDBY will last. Do we have the ability to easily accept an 
async "intent" to complete a job without tying up the monitor?

ATM I think only mirror uses .complete, but it looks like it tries to actually 
set up the pivot a good deal before delegating to the bottom half, so I worry 
it's not safe to try to run this when we are in the middle of a drain.

Any thoughts?


First thing that comes into my mind is that we need one more state: 
standby-completed. So, if user calls blockjob-complete during implicit STANDBY, 
we just remember this fact (by moving to STANDBY-COMPLETE for example), and 
when job is resumed we perform the completion process..

Probably, that also means that information about "is current pause explicit or 
implicit" should become available in query-blockjobs.

Hmm, maybe we can deprecated "ready" and "standby" states and add 
"implicit-pause", and make JobStatus to be a struct:

struct {
  state: JobState (original JobStatus witouht "ready" and "standby" but with 
"implicit-pause")
  *ready: bool, job is ready, available for jobs that supports 
block-job-complete command
}

and this way blockjobs-complete is allowed in implicit-pause state. We also can rename 
"pause" to "user-pause" to not interfere with old meaning.

Really, "ready" is perpendicular thing to job state, and because of this we have to support 
"standby" state, which is almost the same as "paused" but also carries an information that job is 
actually "ready".

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]