qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver


From: Max Reitz
Subject: Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
Date: Wed, 4 Jul 2018 16:07:07 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

On 2018-07-03 13:15, Kevin Wolf wrote:
> Am 02.07.2018 um 14:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 29.06.2018 20:40, Eric Blake wrote:
>>> On 06/29/2018 12:30 PM, John Snow wrote:
>>>>
>>>>
>>>> On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> We need to synchronize backup job with reading from fleecing image
>>>>> like it was done in block/replication.c.
>>>>>
>>>>> Otherwise, the following situation is theoretically possible:
>>>>>
>>>>> 1. client start reading
>>>>> 2. client understand, that there is no corresponding cluster in
>>>>>     fleecing image
>>>>
>>>> I don't think the client refocuses the read, but QEMU does. (the client
>>>> has no idea if it is reading from the fleecing node or the backing
>>>> file.)
>>>>
>>>> ... but understood:
>>>>
>>>>> 3. client is going to read from backing file (i.e. active image)
>>>>> 4. guest writes to active image
>>>>
>>>> My question here is if QEMU will allow reads and writes to interleave in
>>>> general. If the client is reading from the backing file, the active
>>>> image, will QEMU allow a write to modify that data before we're done
>>>> getting that data?
>>>>
>>>>> 5. this write is stopped by backup(sync=none) and cluster is copied to
>>>>>     fleecing image
>>>>> 6. guest write continues...
>>>>> 7. and client reads _new_ (or partly new) date from active image
>>>>>
>>>>
>>>> I can't answer this for myself one way or the other right now, but I
>>>> imagine you observed a failure in practice in your test setups, which
>>>> motivated this patch?
>>>>
>>>> A test or some proof would help justification for this patch. It would
>>>> also help prove that it solves what it claims to!
>>>
>>> In fact, do we really need a new filter, or do we just need to make the
>>> "sync":"none" blockdev-backup job take the appropriate synchronization
>>> locks?
>>>
>>
>> How? We'll need additional mechanism like serializing requests.. Or a way to
>> reuse serializing requests. Using backup internal synchronization looks
>> simpler, and it is already used in block replication.
> 
> But it also just an ugly hack that fixes one special case and leaves
> everything else broken. replication is usually not a good example for
> anything. It always gives me bad surprises when I have to look at it.
> 
> We'll have to figure out where to fix this problem (or what it really
> is, once you look more than just at fleecing), but I think requiring the
> user to add a filter driver to work around missing serialisation in
> other code, and corrupting their image if they forget to, is not a
> reasonable solution.
> 
> I see at least two things wrong in this context:
> 
> * The permissions don't seem to match reality. The NBD server
>   unconditionally shares PERM_WRITE, which is wrong in this case. The
>   client wants to see a point-in-time snapshot that never changes. This
>   should become an option so that it can be properly reflected in the
>   permissions used.
> 
> * Once we have proper permissions, the fleecing setup breaks down
>   because the guest needs PERM_WRITE on the backing file, but the
>   fleecing overlay allows that only if the NBD client allows it (which
>   it doesn't for fleecing).
> 
>   Now we can implement an exception right into backup that installs a
>   backup filter driver between source and target if the source is the
>   backing file of the target. The filter driver would be similar to the
>   commit filter driver in that it simply promises !PERM_WRITE to its
>   parents, but allows PERM_WRITE on the source because it has installed
>   the before_write_notifier that guarantees this condition.

I don't quite understand what you mean "between source and target",
because *the* backup filter driver would need to be on top of both.
Unless you continue to use before-write notifiers, but I don't see why
when a driver that sits on top of source would intercept all requests
anyway.

And if we want to unify mirror and backup at any time, I think we have
to get away from before-write anyway.

>   All writes to the target that are made by the backup job in this setup
>   (including before_write_notifier writes) need to be marked as
>   serialising so that any concurrent reads are completed first.
> 
> And if we decide to add a target filter to backup, we should probably at
> the same time use a filter driver for intercepting source writes instead
> of using before_write_notifier.
> 
> Max, I think you intended to make both source and target children of the
> same block job node (or at least for mirror). But wouldn't that create
> loops in a setup like this? I think we may need two filters that are
> only connected through the block job, but not with block graph edges.

A node can only provide one data view to all of its parents.  It cannot
distinguish based on the parent.  Therefore, the main backup driver on
top of source and target would necessarily present the current disk
(changing) state, just like the mirror filter does.

Fleecing wants another view, though.  It wants an unchanging view.  It
therefore seems clear to me that we'd need to separate nodes: One
between guest and source (which performs the backup by intercepting
write requests), and one between the fleecing user (e.g. NBD) and the
target.  The former presents the current state, the latter presents the
"unchanging" point-in-time snapshot.

Sure, we can get away without the former (we do so today), but I don't
see why we would want to.  blockdev-copy (as an extension of
blockdev-mirror) would have such a node anyway, and I can see no reason
why we wouldn't attach both source and target to it.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]