qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 0/3] block/file-posix: Work around XFS bug


From: Max Reitz
Subject: Re: [RFC 0/3] block/file-posix: Work around XFS bug
Date: Mon, 4 Nov 2019 15:25:12 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1

On 04.11.19 15:03, Alberto Garcia wrote:
> On Fri 25 Oct 2019 04:19:30 PM CEST, Max Reitz wrote:
>>>> So, it's obvious that c8bb23cbdbe32f5c326 is significant for 1M
>>>> cluster-size, even on rotational disk, which means that previous
>>>> assumption about calling handle_alloc_space() only for ssd is wrong,
>>>> we need smarter heuristics..
>>>>
>>>> So, I'd prefer (1) or (2).
>>
>> OK.  I wonder whether that problem would go away with Berto’s subcluster
>> series, though.
> 
> Catching up with this now. I was told about this last week at the KVM
> Forum, but if the problems comes with the use of fallocate() and XFS,
> the I don't think subclusters will solve it.
> 
> handle_alloc_space() is used to fill a cluster with zeroes when there's
> COW, and that happens the same with subclusters, just at the subcluster
> level instead of course.
> 
> What can happen, if the subcluster size matches the filesystem block
> size, is that there's no need for any COW and therefore the bug is never
> triggered. But that's not quite the same as a fix :-)

No, what I meant was that the original problem that led to c8bb23cbdbe
would go away.

c8bb23cbdbe was added because small writes to new clusters are slow when
the clusters are large (because you need to do a 2 MB write on the host
for a 4 kB write from the guest).  So handle_alloc_space() was added to
alleviate the problem with a zero-write instead of actually writing zeroes.

The question is whether there is no need for handle_alloc_space() with
subclusters because a normal write with explicit zeroes being written in
the COW areas would be sufficiently quick.  (Because the subclusters for
2 MB clusters are just 64 kB in size.)

If that were so (right now it doesn’t look like it), we could revert
c8bb23cbdbe and wouldn’t see the bug anymore.

Max

>> Maybe make a decision based both on the ratio of data size to COW area
>> length (only invoke handle_alloc_space() under a certain threshold),
>> and the absolute COW area length (always invoke it above a certain
>> threshold, unless the ratio doesn’t allow it)?
> 
> Maybe combining that with the smaller clusters/subclusters can work
> around the problem. The maximum subcluster size is 64KB (for a 2MB
> cluster).
> 
> Berto
> 


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]