[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Safely reopening image files by stashing fds
From: |
Blue Swirl |
Subject: |
Re: [Qemu-devel] Safely reopening image files by stashing fds |
Date: |
Tue, 9 Aug 2011 19:39:29 +0000 |
On Tue, Aug 9, 2011 at 12:24 PM, Kevin Wolf <address@hidden> wrote:
> Am 09.08.2011 14:00, schrieb Stefan Hajnoczi:
>> On Tue, Aug 09, 2011 at 01:39:13PM +0200, Kevin Wolf wrote:
>>> Am 09.08.2011 12:56, schrieb Stefan Hajnoczi:
>>>> On Tue, Aug 9, 2011 at 11:50 AM, Stefan Hajnoczi <address@hidden> wrote:
>>>>> On Tue, Aug 9, 2011 at 11:35 AM, Kevin Wolf <address@hidden> wrote:
>>>>>> Am 09.08.2011 12:25, schrieb Stefan Hajnoczi:
>>>>>>> On Mon, Aug 8, 2011 at 4:16 PM, Kevin Wolf <address@hidden> wrote:
>>>>>>>> Am 08.08.2011 16:49, schrieb Stefan Hajnoczi:
>>>>>>>>> On Fri, Aug 5, 2011 at 10:48 AM, Kevin Wolf <address@hidden> wrote:
>>>>>>>>>> Am 05.08.2011 11:29, schrieb Stefan Hajnoczi:
>>>>>>>>>>> On Fri, Aug 5, 2011 at 10:07 AM, Kevin Wolf <address@hidden> wrote:
>>>>>>>>>>>> Am 05.08.2011 10:40, schrieb Stefan Hajnoczi:
>>>>>>>>>>>>> We've discussed safe methods for reopening image files (e.g.
>>>>>>>>>>>>> useful for
>>>>>>>>>>>>> changing the hostcache parameter). The problem is that closing
>>>>>>>>>>>>> the file first
>>>>>>>>>>>>> and then opening it again exposes us to the error case where the
>>>>>>>>>>>>> open fails.
>>>>>>>>>>>>> At that point we cannot get to the file anymore and our options
>>>>>>>>>>>>> are to
>>>>>>>>>>>>> terminate QEMU, pause the VM, or offline the block device.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This window of vulnerability can be eliminated by keeping the
>>>>>>>>>>>>> file descriptor
>>>>>>>>>>>>> around and falling back to it should the open fail.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The challenge for the file descriptor approach is that image
>>>>>>>>>>>>> formats, like
>>>>>>>>>>>>> VMDK, can span multiple files. Therefore the solution is not as
>>>>>>>>>>>>> simple as
>>>>>>>>>>>>> stashing a single file descriptor and reopening from it.
>>>>>>>>>>>>
>>>>>>>>>>>> So far I agree. The rest I believe is wrong because you can't
>>>>>>>>>>>> assume
>>>>>>>>>>>> that every backend uses file descriptors. The qemu block layer is
>>>>>>>>>>>> based
>>>>>>>>>>>> on BlockDriverStates, not fds. They are a concept that should be
>>>>>>>>>>>> hidden
>>>>>>>>>>>> in raw-posix.
>>>>>>>>>>>>
>>>>>>>>>>>> I think something like this could do:
>>>>>>>>>>>>
>>>>>>>>>>>> struct BDRVReopenState {
>>>>>>>>>>>> BlockDriverState *bs;
>>>>>>>>>>>> /* can be extended by block drivers */
>>>>>>>>>>>> };
>>>>>>>>>>>>
>>>>>>>>>>>> .bdrv_reopen(BlockDriverState *bs, BDRVReopenState **reopen_state,
>>>>>>>>>>>> int
>>>>>>>>>>>> flags);
>>>>>>>>>>>> .bdrv_reopen_commit(BDRVReopenState *reopen_state);
>>>>>>>>>>>> .bdrv_reopen_abort(BDRVReopenState *reopen_state);
>>>>>>>>>>>>
>>>>>>>>>>>> raw-posix would store the old file descriptor in its reopen_state.
>>>>>>>>>>>> On
>>>>>>>>>>>> commit, it closes the old descriptors, on abort it reverts to the
>>>>>>>>>>>> old
>>>>>>>>>>>> one and closes the newly opened one.
>>>>>>>>>>>>
>>>>>>>>>>>> Makes things a bit more complicated than the simple bdrv_reopen I
>>>>>>>>>>>> had in
>>>>>>>>>>>> mind before, but it allows VMDK to get an all-or-nothing semantics.
>>>>>>>>>>>
>>>>>>>>>>> Can you show how bdrv_reopen() would use these new interfaces? I'm
>>>>>>>>>>> not 100% clear on the idea.
>>>>>>>>>>
>>>>>>>>>> Well, you wouldn't only call bdrv_reopen, but also either
>>>>>>>>>> bdrv_reopen_commit/abort (for the top-level caller we can have a
>>>>>>>>>> wrapper
>>>>>>>>>> function that does both, but that's syntactic sugar).
>>>>>>>>>>
>>>>>>>>>> For example we would have:
>>>>>>>>>>
>>>>>>>>>> int vmdk_reopen()
>>>>>>>>>
>>>>>>>>> .bdrv_reopen() is a confusing name for this operation because it does
>>>>>>>>> not reopen anything. bdrv_prepare_reopen() might be clearer.
>>>>>>>>
>>>>>>>> Makes sense.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> {
>>>>>>>>>> *((VMDKReopenState**) rs) = malloc();
>>>>>>>>>>
>>>>>>>>>> foreach (extent in s->extents) {
>>>>>>>>>> ret = bdrv_reopen(extent->file, &extent->reopen_state)
>>>>>>>>>> if (ret < 0)
>>>>>>>>>> goto fail;
>>>>>>>>>> }
>>>>>>>>>> return 0;
>>>>>>>>>>
>>>>>>>>>> fail:
>>>>>>>>>> foreach (extent in rs->already_reopened) {
>>>>>>>>>> bdrv_reopen_abort(extent->reopen_state);
>>>>>>>>>> }
>>>>>>>>>> return ret;
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>> void vmdk_reopen_commit()
>>>>>>>>>> {
>>>>>>>>>> foreach (extent in s->extents) {
>>>>>>>>>> bdrv_reopen_commit(extent->reopen_state);
>>>>>>>>>> }
>>>>>>>>>> free(rs);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> void vmdk_reopen_abort()
>>>>>>>>>> {
>>>>>>>>>> foreach (extent in s->extents) {
>>>>>>>>>> bdrv_reopen_abort(extent->reopen_state);
>>>>>>>>>> }
>>>>>>>>>> free(rs);
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Does the caller invoke bdrv_close(bs) after bdrv_prepare_reopen(bs,
>>>>>>>>> &rs)?
>>>>>>>>
>>>>>>>> No. Closing the old backend would be part of bdrv_reopen_commit.
>>>>>>>>
>>>>>>>> Do you have a use case where it would be helpful if the caller invoked
>>>>>>>> bdrv_close?
>>>>>>>
>>>>>>> When the caller does bdrv_close() two BlockDriverStates are never open
>>>>>>> for the same image file. I thought this was a property we wanted.
>>>>>>>
>>>>>>> Also, in the block_set_hostcache case we need to reopen without
>>>>>>> switching to a new BlockDriverState instance. That means the reopen
>>>>>>> needs to be in-place with respect to the BlockDriverState *bs pointer.
>>>>>>> We cannot create a new instance.
>>>>>>
>>>>>> Yes, but where do you even get the second BlockDriverState from?
>>>>>>
>>>>>> My prototype only returns an int, not a new BlockDriverState. Until
>>>>>> bdrv_reopen_commit() it would refer to the old file descriptors etc. and
>>>>>> after bdrv_reopen_commit() the very same BlockDriverState would refer to
>>>>>> the new ones.
>>>>>
>>>>> It seems I don't understand the API. I thought it was:
>>>>>
>>>>> do_block_set_hostcache()
>>>>> {
>>>>> bdrv_prepare_reopen(bs, &rs);
>>>>> ...open new file and check everything is okay...
>>>>> if (ret == 0) {
>>>>> bdrv_reopen_commit(rs);
>>>>> } else {
>>>>> bdrv_reopen_abort(rs);
>>>>> }
>>>>> return ret;
>>>>> }
>>>>>
>>>>> If the caller isn't opening the new file then what's the point of
>>>>> giving the caller control over prepare, commit, and abort?
>>>>
>>>> After sending the last email I realized what I was missing:
>>>>
>>>> You need the prepare, commit, and abort API in order to handle
>>>> multi-file block drivers like VMDK.
>>>
>>> Yes, this is whole point of separating commit out. Does the proposal
>>> make sense to you now?
>>
>> It depends on the details. Adding more functions that every BlockDriver
>> must implement is bad, so it's important that we only drop this
>> functionality into raw-posix.c, vmdk.c, and block.c as appropriate.
>
> Yes, I agree.
>
>> I liked the idea of doing a generic FDStash type that the monitor and
>> bdrv_reopen() can use. Blue's idea to hook at the qemu_open() level
>> takes that further.
>
> Well, having something that works for the raw-posix, the monitor and
> maybe some more things is nice. Having something that works for
> raw-posix, Sheepdog and rbd is an absolute requirement and I can't see
> how FDStash solves that.
For Sheepdog also network access functions would need to be hooked.
RBD seems to use librados functions for low level I/O, so that needs
some RBD specific wrappers.
> Even raw-win32 doesn't have an int fd, but a
> HANDLE hfile for its backend.
Just replace "int fd" with "FDStash fd" everywhere?
> So my main concern is that file descriptors are a concept not as generic
> as it needs to be to suit all block drivers.
>
>> But if we can do prepare, commit, and abort in a relatively simple way
>> then I'm for it.
>
> Great, then let's do that.
>
> Kevin
>
>
- Re: [Qemu-devel] Safely reopening image files by stashing fds, (continued)
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Kevin Wolf, 2011/08/05
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Stefan Hajnoczi, 2011/08/08
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Kevin Wolf, 2011/08/08
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Stefan Hajnoczi, 2011/08/09
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Kevin Wolf, 2011/08/09
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Stefan Hajnoczi, 2011/08/09
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Stefan Hajnoczi, 2011/08/09
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Kevin Wolf, 2011/08/09
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Stefan Hajnoczi, 2011/08/09
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Kevin Wolf, 2011/08/09
- Re: [Qemu-devel] Safely reopening image files by stashing fds,
Blue Swirl <=
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Kevin Wolf, 2011/08/10
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Blue Swirl, 2011/08/10
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Kevin Wolf, 2011/08/11
- Re: [Qemu-devel] Safely reopening image files by stashing fds, Blue Swirl, 2011/08/11
Re: [Qemu-devel] Safely reopening image files by stashing fds, Blue Swirl, 2011/08/05