Re: Which qemu change corresponds to RedHat bug 1655408

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Which qemu change corresponds to RedHat bug 1655408

From:	Max Reitz
Subject:	Re: Which qemu change corresponds to RedHat bug 1655408
Date:	Fri, 9 Oct 2020 15:56:29 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0

On 09.10.20 14:55, Jakob Bohm wrote:
> On 2020-10-09 10:48, Max Reitz wrote:

[...]

> The error I got was specifically "Failed to lock byte 100" and VM not
> starting.  The ISO file was on a R/W NFS3 share, but was itself R/O for
> the user that root was mapped to by linux-nfs-server via /etc/exports
> options, specifically the file iso file was mode 0444 in a 0755
> directory, and the exports line was (simplified)
> 
> /share1
> xxxx:xxxx:xxxx:xxxx/64(ro,sync,mp,subtree_check,anonuid=1000,anongid=1000)
> 
> where xxxx:xxxx:xxxx:xxxx/64 is the numeric IPv6 prefix of the LAN
> 
> NFS kernel Server ran Debian Stretch kernel 4.19.0-0.bpo.8-amd64 #1 SMP
> Debian 4.19.98-1~bpo9+1 (2020-03-09) x86_64 GNU/Linux
> 
> NFS client mount options were:
> 
> rw,nosuid,nodev,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,
> soft,proto=tcp6,timeo=600,retrans=6,sec=sys,mountaddr=xxxx:xxxx:xxxx:xxxx:xxxx:xxff:fexx:xxxx,
> 
> mountvers=3,mountport=45327,mountproto=udp6,local_lock=none,addr=xxxx:xxxx:xxxx:xxxx:xxxx:xxff:fexx:xxxx
> 
> 
> NFS client ran Debian Buster kernel 4.19.0-0.bpo.6-amd64 #1 SMP Debian
> 4.19.67-2+deb10u2~bpo9+1 (2019-11-12) x86_64 with Debian qemu-system-
> x86 version 1:5.0-14~bpo10+1  Booting used SysV init and libvirt
> was not used.
> 
> Copying the ISO to a local drive (where qemu-as-root had full
> capabilities to bypass file security) worked around the failure.
> 
> I hope these details help reproduce the bug.

I’ll try again, thanks.

Can you perchance reproduce the bug with a more recent upstream kernel
(e.g. 5.8)?  I seem to recall there have been some locking bugs in the
NFS code, perhaps there was something that was fixed by now.

(Or at least 4.19.150, which seems to be the most recent 4.19.x
according to kernel.org)

> And I still have no idea why qemu tried to lock bytes in a read-only raw
> image file, there is no block metadata to synchronize access to (like in
> qcow2), when the option explicitly said ",format=raw" to avoid attempts
> to access the iso file as any of the advanced virtual disk formats.

I reasoned about that in my previous reply already, see below.  It’s
because just because an image file is read-only when opening it doesn’t
mean that it’s going to stay that way.

You’re correct that in the case of raw, this isn’t about metadata (as
there isn’t any), but about guest data, which needs to be protected from
concurrent access all the same, though.

(As for “why does qemu try to lock, when the option explicitly said
raw”; there is a dedicated option to turn off locking, and that is
file.locking=off.  I’m not suggesting that as a realistic work-around,
I’m just adding that FYI in case you didn’t know and need something ASAP.)

[...]

>>>>> The error message itself seams meaningless, as there is no particular
>>>>> reason to request file locks on a read-only raw disk image.
>>
>> Yes, there is.  We must prevent a concurrent instance from writing to
>> the image[1], and so we have to signal that somehow, which we do through
>> file locks.
>>
>> I suppose it can be argued that if the image file itself is read-only
>> (outside of qemu), there is no need for locks, because nothing could
>> ever modify the image anyway.  But wouldn’t it be possible to change the
>> modifications after qemu has opened the image, or to remount some RO
>> filesystem R/W?
>>
>> Perhaps we could automatically switch off file locks for a given image
>> file when taking the first one fails, and the image is read-only.  But
>> first I’d rather know what exactly is causing the error you see to
>> appear.
>>
>> [1] Technically, byte 100 is about being able to read valid data from
>> the image, which is a constraint that’s only very rarely broken.  But
>> still, it’s a constraint that must be signaled.  (You only see the
>> failure on this byte, because the later bytes (like the one not
>> preventing concurrent R/W access, 201) are not even attempted to be
>> locked after the first lock fails.)
>>
>> (As for other instances writing to the image, you can allow that by
>> setting the share-rw=on option on the guest device.  This tells qemu
>> that the guest will accept modifications from the outside.  But that
>> still won’t prevent qemu from having to take a shared lock on byte 100.)
>>
>> Max

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Which qemu change corresponds to RedHat bug 1655408, Philippe Mathieu-Daudé, 2020/10/08
- Re: Which qemu change corresponds to RedHat bug 1655408, Jakob Bohm, 2020/10/08
  - Re: Which qemu change corresponds to RedHat bug 1655408, Philippe Mathieu-Daudé, 2020/10/08
  - Re: Which qemu change corresponds to RedHat bug 1655408, Max Reitz, 2020/10/09
    - Re: Which qemu change corresponds to RedHat bug 1655408, Jakob Bohm, 2020/10/09
    - Re: Which qemu change corresponds to RedHat bug 1655408, Max Reitz <=
    - Re: Which qemu change corresponds to RedHat bug 1655408, Jakob Bohm, 2020/10/09
    - Re: Which qemu change corresponds to RedHat bug 1655408, Max Reitz, 2020/10/12
    - Re: Which qemu change corresponds to RedHat bug 1655408, Max Reitz, 2020/10/12
    - Re: Which qemu change corresponds to RedHat bug 1655408, Jakob Bohm, 2020/10/12

Prev by Date: [PULL v2 6/8] qemu-nbd: Honor SIGINT and SIGHUP
Next by Date: Re: [PATCH 2/3] migration: Make save_snapshot() return bool, not 0/-1
Previous by thread: Re: Which qemu change corresponds to RedHat bug 1655408
Next by thread: Re: Which qemu change corresponds to RedHat bug 1655408
Index(es):
- Date
- Thread