[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug 1878255] Re: Assertion failure in bdrv_aio_cancel, through ide
From: |
Philippe Mathieu-Daudé |
Subject: |
Re: [Bug 1878255] Re: Assertion failure in bdrv_aio_cancel, through ide |
Date: |
Sat, 30 May 2020 09:33:31 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 |
On 5/30/20 12:59 AM, John Snow wrote:
> outl 0xcf8 0x8000fa24
> outl 0xcfc 0xe106c000 (Writes e106c00 to BAR5 for 0:31:2)
We might eventually display this in the reproducer output.
>
> outl 0xcf8 0x8000fa04
> outw 0xcfc 0x7 (Enables BM, Memory IO and PIO for 0:31:2)
>
> outl 0xcf8 0x8000fb20 (Enables 0:31:3, I guess? My PCI knowledge is
> iffy. We set the enable bit and select BAR4, but then we don't actually
> write to 0xcfc again to set anything.)
>
>
> write 0x0 0x3 0x2780e7
> - write these three bytes to addr 0 in memory.
>
> write 0xe106c22c 0xd 0x1130c218021130c218021130c2
> - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSCTL] @ 0x2c:
> 0x18c23011
> - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSERR] @ 0x30:
> 0xc2301102
> - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSACT] @ 0x34:
> 0x30110218
> - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxCI] @ 0x38:
> 0x000000c2
>
> write 0xe106c218 0x15 0x110010110010110010110010110010110010110010
>
> - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxCMD] @ 0x18:
> 0x11100011
> - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:Reserved] @ 0x1c:
> 0x00111000
> - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxTFD] @ 0x20:
> 0x10001110
> - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSIG] @ 0x24:
> 0x11100011
> - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSSTS] @ 0x28:
> 0x00111000
> - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSCTL] @ 0x2c:
> 0x00000010
>
> Not all of those register writes are actually important for the bug, so
> I simplified them to the fewest writes and fewest bits.
>
> outl 0xcf8 0x8000fa24
> outl 0xcfc 0xe106c000
> outl 0xcf8 0x8000fa04
> outw 0xcfc 0x7
> outl 0xcf8 0x8000fb20
> write 0x0 0x3 0x2780e7
> write 0xe106c22c 0x4 0x01000000
> write 0xe106c238 0x2 0x02
> write 0xe106c218 0x4 0x11000000
> write 0xe106c22c 0x1 0x00
>
>
> 1. PxSCTL write arms the DET bit. It isn't intended to be left on when
> PxCMD.ST (Start) is issued. It's not clear what should happen if this DOES
> occur. (Undefined behavior, at the very least.)
> See AHCI 1.3 section 3.3.1.1 "Offset 2Ch: PxSCTL – Port x Serial ATA Control
> (SCR2: SControl)"
>
> This bit is intended to send a reset signal to attached SATA drives.
> QEMU just synchronously resets the drive because we can.
>
>
> 2. PxCI is not intended to be written to when PxCMD.ST is unset. The spec
> suggests that when ST transitions from '1' to '0' that this field is cleared,
> but it does not suggest what happens when it transitions from '0' to '1'.
> QEMU will happily set the register.
>
>
> 3. PxCMD write: This sets PxCMD.ST and PxCMD.FRE, which engages the AHCI
> device in earnest.
>
> At this point, AHCI sees outstanding commands and tries to process them.
> The FIS receive address is never programmed, so it's at zero. We start
> reading a FIS there:
>
> 15712@1590789960.784835:handle_cmd_fis_dump ahci(0x55b4c56621a0)[2]: FIS:
> 0x00: 27 80 e7 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x40: 34 40 70 01 01 14 eb 20 00 00 00 00 01 00 00 00
> 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> This is translated as:
> 0x27 SATA_FIS_TYPE_REGISTER_H2D
> 0x80 SATA_FIS_REG_H2D_UPDATE_COMMAND_REGISTER
> 0xe7 command = FLUSH CACHE
>
> This will engage ide_flush_cache() (core.c)
>
>
> At this point I get a little confused. I wouldn't think we'd have a
> BlockBackend here for ide_flush to work on, but it seems to think we do and
> allows the flush command to proceed. We then immediately try to cancel this
> flush, but bdrv_aio_cancel can't tolerate an aiocb with a null BDS and panics.
>
> ...Hm, it should be the case that blk_do_flush detects this as
> ENOMEDIUM, but are we maybe just canceling this request too fast? I
> actually can't trigger this through the console, but I can trigger it by
> redirecting input from a .txt file.
>
> Yup, OK: if you look in blk_aio_prwv, we schedule a oneshot to invoke
> the callback on a synchronous failure, but we are managing to inject the
> reset command before the oneshot gets a chance to run.
What is not clear to me is, can this be reproduced by a real guest, or
only replaying the fuzzer payload (via the qtest chardev)?
Very nicely detailed analysis btw!
Various parts are worth being copied in the fix commit description.
>
> I think either blk_aio_cancel or bdrv_aio_cancel needs to check that
> there isn't a dangling BH callback -- it seems wrong to make it as far
> as bdrv_aio_cancel when there's no BDS, but the IDE device no longer has
> any idea why its callback hasn't returned yet, and blk_aio_cancel is the
> only mechanism it has to kick the state machine forward.
>
- [Bug 1878255] [NEW] Assertion failure in bdrv_aio_cancel, through ide, Alexander Bulekov, 2020/05/12
- [Bug 1878255] Re: Assertion failure in bdrv_aio_cancel, through ide, Philippe Mathieu-Daudé, 2020/05/12
- [Bug 1878255] Re: Assertion failure in bdrv_aio_cancel, through ide, Stefan Hajnoczi, 2020/05/14
- [Bug 1878255] Re: Assertion failure in bdrv_aio_cancel, through ide, John Snow, 2020/05/29
- Re: [Bug 1878255] Re: Assertion failure in bdrv_aio_cancel, through ide,
Philippe Mathieu-Daudé <=
- [Bug 1878255] Re: Assertion failure in bdrv_aio_cancel, through ide, John Snow, 2020/05/30
- [Bug 1878255] Re: Assertion failure in bdrv_aio_cancel, through ide, Alexander Bulekov, 2020/05/30