qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] net: tap: check if the file descriptor is valid before using


From: Laurent Vivier
Subject: Re: [PATCH] net: tap: check if the file descriptor is valid before using it
Date: Tue, 30 Jun 2020 14:42:38 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0

On 30/06/2020 14:35, Daniel P. Berrangé wrote:
> On Tue, Jun 30, 2020 at 02:00:06PM +0200, Laurent Vivier wrote:
>> On 30/06/2020 13:03, Daniel P. Berrangé wrote:
>>> On Tue, Jun 30, 2020 at 12:35:46PM +0200, Laurent Vivier wrote:
>>>> On 30/06/2020 12:03, Jason Wang wrote:
>>>>>
>>>>> On 2020/6/30 下午5:45, Laurent Vivier wrote:
>>>>>> On 30/06/2020 11:31, Daniel P. Berrangé wrote:
>>>>>>> On Tue, Jun 30, 2020 at 10:23:18AM +0100, Daniel P. Berrangé wrote:
>>>>>>>> On Tue, Jun 30, 2020 at 05:21:49PM +0800, Jason Wang wrote:
>>>>>>>>> On 2020/6/30 上午3:30, Laurent Vivier wrote:
>>>>>>>>>> On 28/06/2020 08:31, Jason Wang wrote:
>>>>>>>>>>> On 2020/6/25 下午7:56, Laurent Vivier wrote:
>>>>>>>>>>>> On 25/06/2020 10:48, Daniel P. Berrangé wrote:
>>>>>>>>>>>>> On Wed, Jun 24, 2020 at 09:00:09PM +0200, Laurent Vivier wrote:
>>>>>>>>>>>>>> qemu_set_nonblock() checks that the file descriptor can be
>>>>>>>>>>>>>> used and, if
>>>>>>>>>>>>>> not, crashes QEMU. An assert() is used for that. The use of
>>>>>>>>>>>>>> assert() is
>>>>>>>>>>>>>> used to detect programming error and the coredump will allow
>>>>>>>>>>>>>> to debug
>>>>>>>>>>>>>> the problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But in the case of the tap device, this assert() can be
>>>>>>>>>>>>>> triggered by
>>>>>>>>>>>>>> a misconfiguration by the user. At startup, it's not a real
>>>>>>>>>>>>>> problem,
>>>>>>>>>>>>>> but it
>>>>>>>>>>>>>> can also happen during the hot-plug of a new device, and here
>>>>>>>>>>>>>> it's a
>>>>>>>>>>>>>> problem because we can crash a perfectly healthy system.
>>>>>>>>>>>>> If the user/mgmt app is not correctly passing FDs, then there's
>>>>>>>>>>>>> a whole
>>>>>>>>>>>>> pile of bad stuff that can happen. Checking whether the FD is
>>>>>>>>>>>>> valid is
>>>>>>>>>>>>> only going to catch a small subset. eg consider if fd=9 refers
>>>>>>>>>>>>> to the
>>>>>>>>>>>>> FD that is associated with the root disk QEMU has open. We'll
>>>>>>>>>>>>> fail to
>>>>>>>>>>>>> setup the TAP device and close this FD, breaking the healthy
>>>>>>>>>>>>> system
>>>>>>>>>>>>> again.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not saying we can't check if the FD is valid, but lets be
>>>>>>>>>>>>> clear that
>>>>>>>>>>>>> this is not offering very much protection against a broken mgmt
>>>>>>>>>>>>> apps
>>>>>>>>>>>>> passing bad FDs.
>>>>>>>>>>>>>
>>>>>>>>>>>> I agree with you, but my only goal here is to avoid the crash in
>>>>>>>>>>>> this
>>>>>>>>>>>> particular case.
>>>>>>>>>>>>
>>>>>>>>>>>> The punishment should fit the crime.
>>>>>>>>>>>>
>>>>>>>>>>>> The user can think the netdev_del doesn't close the fd, and he
>>>>>>>>>>>> can try
>>>>>>>>>>>> to reuse it. Sending back an error is better than crashing his
>>>>>>>>>>>> system.
>>>>>>>>>>>> After that, if the system crashes, it will be for the good
>>>>>>>>>>>> reasons, not
>>>>>>>>>>>> because of an assert.
>>>>>>>>>>> Yes. And on top of this we may try to validate the TAP via st_dev
>>>>>>>>>>> through fstat[1].
>>>>>>>>>> I agree, but the problem I have is to know which major(st_dev) we can
>>>>>>>>>> allow to use.
>>>>>>>>>>
>>>>>>>>>> Do we allow only macvtap major number?
>>>>>>>>>
>>>>>>>>> Macvtap and tuntap.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> How to know the macvtap major number at user level?
>>>>>>>>>> [it is allocated dynamically: do we need to parse /proc/devices?]
>>>>>>>>>
>>>>>>>>> I think we can get them through fstat for /dev/net/tun and
>>>>>>>>> /dev/macvtapX.
>>>>>>>> Don't assume QEMU has any permission to access to these device nodes,
>>>>>>>> only the pre-opened FDs it is given by libvirt.
>>>>>>> Actually permissions are the least of the problem - the device nodes
>>>>>>> won't even exist, because QEMU's almost certainly running in a private
>>>>>>> mount namespace with a minimal /dev populated
>>>>>>>
>>>>>> I'm working on a solution using /proc/devices.
>>>>>
>>>>>
>>>>> Similar issue with /dev. There's no guarantee that qemu can access
>>>>> /proc/devices or it may not exist (CONFIG_PROCFS).
>>>>
>>>> There is a lot of things that will not work without /proc (several tools
>>>> rely on /proc, like ps, top, lsof, mount, ...). Some information are
>>>> only available from /proc, and if /proc is there, I think /proc/devices
>>>> is always readable by everyone. Moreover /proc is already used by qemu
>>>> in several places.
>>>>
>>>> It can also a best effort check.
>>>>
>>>> The problem with fstat() on /dev files is to guess the /dev/macvtapX as
>>>> X varies (the same with /dev/tapY)..
>>>>
>>>>>
>>>>>> macvtap has its own major number, but tuntap use "misc" (10) major
>>>>>> number.
>>>>
>>>> Another question: it is possible to use the "fd=" parameter with macvtap
>>>> as macvtap creates a /dev/tapY device, but how to do that with tuntap
>>>> that does not create a /dev/tapY device?
>>>
>>>
>>> I think we should step back and ask why we need to check this at all.
>>>
>>> IMHO, if the passed-in FD works with the syscalls that tap-linux.c
>>> is executing, then that shows the FD is suitable for QEMU. The problem
>>> is that many of the tap APIs don't use "Error **errp" parameters to
>>> report errors, so we can't catch the failures. IOW, instead of checking
>>> the FD major/minor number, we should make the existing code be better
>>> at reporting errors, so they can be fed back to the QMP console
>>> gracefully.
>>
>> The problem here is the very first operation of net_init_tap() is a
>> qemu_set_nonblock() that has an assert() and crashes QEMU.
>>
>> It's why I was only checking for the validity of the file descriptor,
>> not if it is a tap device or not.
> 
> Yep, checking that it is really a FD is sufficient to avoid the
> assert in nonblock.
> 
> As for whether it is really a tap device, I think we just need to
> improve error reporting of the functions that come later, instead
> of doing a literal "is it a tap" check.

I agree. I will update my patches to have a series with my patch
checking for the validity of fd and another patch to return the errors
to QMP from the tap functions.

> That's what I'd tried in my old patch from a few years back
> 
>    https://patchwork.kernel.org/patch/10029443/
> 
> I can't remember why we didn't merge this back then

Jason already gave the link in the thread.
I'm going to try to use your patch in my series.

Thanks,
Laurent





reply via email to

[Prev in Thread] Current Thread [Next in Thread]