[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 0/3] vfio-pci: support recovery of AER non fatal
From: |
Cao jin |
Subject: |
Re: [Qemu-devel] [PATCH 0/3] vfio-pci: support recovery of AER non fatal error |
Date: |
Tue, 7 Mar 2017 19:46:03 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 |
ping
On 02/27/2017 03:30 PM, Cao jin wrote:
> This is nearly new design of the feature, so re-number the verion from 0.
>
> About The test:
> Hardware problem(unsteady) still occurs like before. The test server is in
> another country spot A, and my contact of the country located spot B, so
> it is not quite convenient to find help(plug cable, or check the hardware).
> So, my NIC(has 2 functions) still just has func1 connected to gateway.
> If there is other people who has the hardware could test the patches, that
> would be great help.
>
>
> Basically, there are two phenomenon of unsteady hardware:
> 1. Start vm, the hardware emit fatal error itself before I did anything,
> cause vm stop.
> 2. Start vm, assign IP to func1, then ping the gateway, it will show
> "Destination Host Unreachable" after dozens of or hundreds of successful
> ping, and guest dmesg shows nothing abnormal. I think this phenomenon is
> the *strong evidence* of saying unsteady hardware, I speculate that
> the cable has problem.
>
> on the opposite, I also saw perfect result 2 times in my numerous tests,
> which just assign func1 while func0 has no user. It can ping several
> housrs(
> more than 15000 times ping) withtout any problem, during the period, inject
> non fatal error to func0 & func1, error recovery is very good.
>
> So, most of time, I must do the test quickly before the hardware goes
> crazy,
> until get what I expected.
>
>
> Test:
> scenario 1: assign func1 to vm while func0 has no user.
> scenario 2: assign both functions to 1 vm, with the same topology as host.
> scenario 3: assign both functions to 1 vm, under different bus.
> scenario 4: assign each function to a separate vm.
>
> the steps is: assign IP to func1, ping the gateway, inject non fatal error to
> both functions, see if func1 still can ping after recovery.
>
> Although we don't have cable for func0, but in the test like scenario 4,
> inject to func0, it doesn't affect func1's recovery, so I think it can prove
> that one function's recovery doesn't affect another.
>
>
> Extra info FYI:
> 1. During the test, some debug lines are added in vfio_err_notifier_handler,
> read the uncor status register in this function when fatal error occured,
> it shows all F's every time.
> 2. Based on the v10 patch & the corresponding kernel part, modified as
> comments: revert the eventfd handling(don't signal uncor status), and
> guest link reset will induce the host link reset. The test result shows:
> non fatal error recovery is good; fatal error recovery has same result
> with what Alex find before(guest kernel crash), because guest device
> driver's error_detected() access the MMIO registers, get all F's.
>
>
> Cao jin (3):
> pcie aer: verify if AER functionality is available
> vfio pci: new function to init AER capability
> vfio-pci: process non fatal error of AER
>
> hw/pci/pcie_aer.c | 28 +++++++
> hw/vfio/pci.c | 180
> +++++++++++++++++++++++++++++++++++++++++++--
> hw/vfio/pci.h | 3 +
> linux-headers/linux/vfio.h | 1 +
> 4 files changed, 207 insertions(+), 5 deletions(-)
>
--
Sincerely,
Cao jin
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [Qemu-devel] [PATCH 0/3] vfio-pci: support recovery of AER non fatal error,
Cao jin <=