qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.2 v2 3/3] block/file-posix: Let post-EOF fallocate seri


From: Max Reitz
Subject: Re: [PATCH for-4.2 v2 3/3] block/file-posix: Let post-EOF fallocate serialize
Date: Tue, 2 Jun 2020 18:38:27 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0

On 02.06.20 18:16, Vladimir Sementsov-Ogievskiy wrote:
> 02.06.2020 18:46, Max Reitz wrote:
>> On 02.06.20 16:43, Vladimir Sementsov-Ogievskiy wrote:
>>> 01.11.2019 18:25, Max Reitz wrote:
>>>
>>> Sorry for being late, I have some comments
>>
>> Uh, well.  Reasonable, but I hope you don’t mind me having no longer
>> having this patch fresh on my mind.
>>
>>>> The XFS kernel driver has a bug that may cause data corruption for
>>>> qcow2
>>>> images as of qemu commit c8bb23cbdbe32f.  We can work around it by
>>>> treating post-EOF fallocates as serializing up until infinity
>>>> (INT64_MAX
>>>> in practice).
>>>>
>>>> Cc: qemu-stable@nongnu.org
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    block/file-posix.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 36 insertions(+)
>>>>
>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>> index 0b7e904d48..1f0f61a02b 100644
>>>> --- a/block/file-posix.c
>>>> +++ b/block/file-posix.c
>>>> @@ -2721,6 +2721,42 @@ raw_do_pwrite_zeroes(BlockDriverState *bs,
>>>> int64_t offset, int bytes,
>>>>        RawPosixAIOData acb;
>>>>        ThreadPoolFunc *handler;
>>>>    +#ifdef CONFIG_FALLOCATE
>>>> +    if (offset + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
>>>> +        BdrvTrackedRequest *req;
>>>> +        uint64_t end;
>>>> +
>>>> +        /*
>>>> +         * This is a workaround for a bug in the Linux XFS driver,
>>>> +         * where writes submitted through the AIO interface will be
>>>> +         * discarded if they happen beyond a concurrently running
>>>> +         * fallocate() that increases the file length (i.e., both the
>>>> +         * write and the fallocate() happen beyond the EOF).
>>>> +         *
>>>> +         * To work around it, we extend the tracked request for this
>>>> +         * zero write until INT64_MAX (effectively infinity), and mark
>>>> +         * it as serializing.
>>>> +         *
>>>> +         * We have to enable this workaround for all filesystems and
>>>> +         * AIO modes (not just XFS with aio=native), because for
>>>> +         * remote filesystems we do not know the host configuration.
>>>> +         */
>>>> +
>>>> +        req = bdrv_co_get_self_request(bs);
>>>> +        assert(req);
>>>> +        assert(req->type == BDRV_TRACKED_WRITE);
>>>> +        assert(req->offset <= offset);
>>>> +        assert(req->offset + req->bytes >= offset + bytes);
>>>
>>> Why these assertions?
>>
>> Mostly to see that bdrv_co_get_self_request() (introduced by the same
>> series) actually got the right request.  (I suppose.)
>>
>>> TrackedRequest offset and bytes fields correspond
>>> to the original request. When request is being expanded to satisfy
>>> request_alignment, these fields are not updated.
>>
>> Well, shrunk in this case, but OK.
>>
>>> So, maybe, we should assert overlap_offset and overlap_bytes?
>>
>> Maybe, but would that have any benefit?  Especially after this patch
>> having been in qemu for over half a year?
>>
>> (Also, intuitively off the top of my head I don’t see how it would make
>> more sense to check overlap_offset and overlap_bytes, if all the
>> assertions are for is to see that we got the right request.
>> overlap_offset and overlap_bytes may still not exactly match @offset or
>> @bytes, respectively.)
>>
>> Your suggestion makes it sound a bit like you have a different purpose
>> in mind what these assertions might be useful for...?
> 
> No I just think it may have false-positives, when actual request is larger
> than original.

Seems like a bug.  Why would we zero more than originally requested?

> So offset may be < req->offset and req->offset +
> req->bytes may be
> less than offset + bytes. And we will crash. I should make a reproducer to
> prove it, but it seems possible.

I’m definitely curious.

>>>> +
>>>> +        end = INT64_MAX & -(uint64_t)bs->bl.request_alignment;
>>>> +        req->bytes = end - req->offset;
>>>
>>> And I doubt that we should update req->bytes. We never updated it in
>>> other places, it corresponds to original request. It's enough to update
>>> overlap_bytes to achieve corresponding serialising.
>>
>> Does it hurt?  If so, would you send a patch?
>>
>> I assume you reply to this patch instead of writing a patch because you
>> have the same feeling of “It probably doesn’t really matter, so let’s
>> have a discussion first”.
> 
> 1. yes, and
> 2. I probably don't see the full picture around tracked requests

Neither do I, that’s for sure.

>> My stance is: I don’t think it matters and this whole piece of code is a
>> hack that shouldn’t exist, obviously.  So I don’t really care how it
>> fits into all of our other code.
>>
>> I would like to say I wouldn’t mind a patch to drop the req->bytes
>> assignment, but OTOH it would mean I’d have to review it and verify that
>> it’s indeed sufficient to set overlap_bytes.
>>
>> If it’s in any way inconvenient for you that req->bytes is adjusted,
>> then of course please send one.
>>
>>>> +        req->overlap_bytes = req->bytes;
>>>> +
>>>> +        bdrv_mark_request_serialising(req, bs->bl.request_alignment);
>>>
>>> Not sure, how much should we care about request_alignment here, I think,
>>> it's enough to just set req->overlap_bytes = INT64_MAX -
>>> req->overlap_offest, but it doesn't really matter.
>>
>> As long as req->bytes is adjusted, we have to care, or the overlap_bytes
>> calculation in bdrv_mark_request_serialising will overflow.
>>
>> Well, one could argue that it doesn’t matter because the MAX() will
>> still do the right thing, but overflowing is never nice.
> 
> Hmm I think, if reduce it to just INT64_MAX, we should pass 1 as align
> to bdrv_mark_request_serialising.

True.

>> (Of course, it probably doesn’t matter at all if we just wouldn’t touch
>> req->bytes.)
>>
> 
> OK, thanks for the answer, I'll prepare a patch.

OK?  I’m not sure where the benefit is (apart from the perhaps failing
assertions).  So it still looks to me like putting too much energy into
a hack.

(I think the original reason I set both req->bytes and
req->overlap_bytes was actually because I just wanted to be sure, and
didn’t want to have to look too hard whether either would be sufficient.)

((Please also note that I can’t guarantee I will review your patch in a
timely manner, for one thing because I can already rarely give that
promise (as you are probably painfully aware...); and now there’s also
two weeks of mail on top for me to wade through after PTO.  So if
there’s no reason to change anything apart from saving two LoC, well.
Failing assertions are a different matter altogether, though, of course.))

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]