[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PULL 5/8] target-sparc: Use global registers for the r
Re: [Qemu-devel] [PULL 5/8] target-sparc: Use global registers for the register window
Fri, 24 Jun 2016 14:35:47 +0200
On Fri, Jun 24, 2016 at 2:03 PM, Paolo Bonzini <address@hidden> wrote:
> On 24/06/2016 12:42, Mark Cave-Ayland wrote:
>> On 24/06/16 07:36, Paolo Bonzini wrote:
>>> On 24/06/2016 05:57, Richard Henderson wrote:
>>>> Whatever happens, it happens after 10GB of logs, which is simply too
>>>> much to sift through. I've tried to narrow it down, but the lack of a
>>>> hardware tlb refill means that we get hundreds of thousands of Data
>>>> Access Faults that are simply TLB misses and not the actual Segmentation
>>>> Fault in question.
>>>> It doesn't seem to affect other OSes, so I can't imagine what quirk is
>>>> being exercised in this case.
>>>> As loath as I am to suggest it, we may have to revert the sparc indirect
>>>> register patch for the release.
>>> We have more than a month. If it's reproducible, it can be fixed. :)
>>>> I do now ping the rest of my sparc improvements patchset. It's
>>>> completely independent of the use of indirect registers.
>>> Mark, perhaps you can try to use migration to reduce the amount of
>>> logging? (Start QEMU with -snapshot, try to stop the vm before it
>>> fails. If you succeed, do a "migrate exec:cat>foo.sav" followed by
>>> "commit"; if you fail, try again).
>> Yeah, given the improvements that Richard has made, I'd prefer not to
>> revert if at all possible. Finally I have some spare time today so I'll
>> try and get this down to an easily-testable qcow2 image that can
>> reproduce the issue.
> I've gotten an image that reaches the segmentation fault in about 1
> second but I cannot upload it anywhere in the next few hours. The good
> news is that it fails even without a hard disk (so it's a stateless vm)
> and with -d nochain -singlestep. The bad news is that the dump is not
> very deterministic and that I failed to create images closer to the failure.
I have a fix for the bug, will post it shortly. This patch does reveal a bug in
the ldstub implementation, but I'm really surprised we haven't hit it before.
I observe the following sequence under Solaris:
1. a memory page is mapped without the write permission.
2. the ldstub instruction is executed on this page.
3. ldstub raises an access exception
4b. Because of the bug in the ldstub, a register gets corrupted.
5b. Solaris kernel re-maps the page to have the write access
6b. ldstub is executed again with the corrupted register content
7b. Segmentation fault
With the ldstub fix it goes:
4a. Solaris kernel re-maps the page to have the write access
5a. ldstub is executed again, this time all is good.
There must be another bug in memory access handling.
Maybe cached TBs can be executed with the wrong mem idx?
SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu