[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] Tests with the latest ppc repo

From: Kevin Wolf
Subject: Re: [Qemu-ppc] Tests with the latest ppc repo
Date: Tue, 13 Jan 2015 13:04:41 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 13.01.2015 um 12:31 hat Mark Cave-Ayland geschrieben:
> On 13/01/15 10:17, Kevin Wolf wrote:
> >> 2) The fix for mixing synchronous and asynchronous requests in the macio
> >> code corrupts a fresh install of my Darwin test ISO when running through
> >> the complete installation:
> >> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=3e300fa6ad4ee19b16339c25773dec8df0bfb982
> > 
> > First of all, the commits message says:
> > 
> >     However, the block infrastructure changed below our feet and now
> >     it's impossible to call a synchronous block read/write from the aio
> >     callback handler of a previous block access.
> > 
> > That's news for me. If it's true, this is probably not intentional at
> > all and block layer code should have been fixed instead.
> > 
> > What kind of failure do you get when you try such a call?
> The original patch was supplied by Alex based on a report from John
> which I suspect may have been provided off-list (see
> http://lists.gnu.org/archive/html/qemu-ppc/2014-05/msg00461.html) and
> was not something that I was seeing locally at the time.

Perhaps Alex can remember the details, CCed.

> >> So far I don't see anything obvious with the above commit except that
> >> the multi-write combining seems to be more aggressive afterwards.
> > 
> > Maybe I'm missing something, but I don't see any combining in this code?
> This is something that I found by enabling tracepoints for both the good
> and bad commits and trying to compare the differences. It looks like
> when the unaligned requests were submitted previously, e.g. issue the
> bulk request via dma_blk_read()/dma_blk_write() followed by the
> remainder with bdrv_read()/bdrv_write() then the block layer disk
> requests would look fairly similar to the DMA engine requests.
> When the above commit changed the unaligned accesses from
> bdrv_read()/bdrv_write() into bdrv_aio_readv()/bdrv_aio_writev() as used
> internally by dma_blk_read() and dma_blk_write(), then individual DMA
> read/write requests would start getting merged together, presumably by
> the request-merging code in block.c.
> Note that this is on a laptop with a slow internal HD (non-SSD) with
> encryption enabled and so could this perhaps could be encouraging more
> request merging?

block.c only merges writes when bdrv_aio_multiwrite() is explicitly
called. The only caller of that function is virtio-blk, so I would rule
that out.

Did you check that it isn't the guest who already sends different

> >> I'm wondering whether or not the synchronous requests for the
> >> unaligned accesses had a side effect of introducing read/write
> >> barriers which have now been removed?
> > 
> > Let's start a bit simpler: While a request is running, its iov is in
> > use. pmac_ide_transfer_cb() seems to issue an AIO request and
> > immediately continue working on the same iov. I wouldn't be surprised if
> > some internal data structures in the macio driver weren't made for
> > multiple requests at the same time either.
> > 
> > The usual pattern is that after issuing an AIO command, you return, and
> > only continue what you were doing in its callback.
> > 
> > That said, you have so many operations in pmac_ide_transfer_cb() using
> > the function recursively as their callback that I find it really hard to
> > read by now. Perhaps it would benefit from a conversion to coroutines,
> > which would make the code linear again.
> Yeah, it's a fairly horrible piece of code which is why I've been
> spending time trying to fix it up :/  I do have an alpha quality patch
> that separates the DMA and disk logic back out again, but it also
> suffers from the same problem as the existing code in that while the
> installation completes the resulting disk image is corrupted in a
> seemingly similar way.
> I can temporarily push this code, which I believe is easier to read, to
> my github account if that helps?

Yes, please. More information can only help.

> Another solution to explore could be to move the AIO context over to
> using offset and length rather than sector_num and sector count so that
> the existing code for allowing 512 byte sector accesses on 4096 byte
> sector devices can be used. The caller could potentially provide an
> "allow unaligned accesses" flag when creating the AIO context, and then
> all that would remain would be to create custom DMA helpers for macio,
> similar to what I've tried to do in my experimental patch.

Do we really need a flag, or is it just that we need to move
dma-helpers.c from sectory granularity to byte granularity?


> >> I can provide an easy, although annoyingly time-consuming test case if
> >> either Kevin or Stefan (added to CC) are able to help point me in the
> >> right direction as to how to go about debugging this?
> > 
> > I'm not familiar enough with the controller to help in debugging it, but
> > as long as we know which one is the faulty patch, we can probably solve
> > this by code review.
> That would be great! For reference the test case is fairly simple:
> 1) Grab and decompress the darwinppc-602.iso file from
> https://opensource.apple.com/static/iso/
> 2) Grab and decompress the pre-partitioned qcow2 HD image from
> http://www.ilande.co.uk/tmp/darwin_empty.qcow2.xz
> 3) Switch to something around the v2.0 release, e.g.
> git checkout v2.0.0
> 3) Run the installation like this:
> qemu-system-ppc -hda darwin_empty.qcow2 -cdrom darwinppc-602.iso -boot d
> 4) Verify the installation completes successfully with no errors and the
> HD image can be booted
> 5) Now cherry-pick commit 3e300fa6ad4ee19b16339c25773dec8df0bfb982 on
> top of your v2.0.0 git checkout
> 6) Re-run the entire installation as per step 3 on a fresh copy of
> darwin_empty.qcow2
> At the very end of the installation, the kernel extension cache build
> fails with "Segmentation Fault" on "chroot /mnt kextcache -el -a `arch`
> 2>/dev/null". The resulting disk image is subsequently unusable and
> hangs on reboot.
> ATB,
> Mark.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]