|Subject:||dma_blk helpers and infinite dma_memory_map retries|
|Date:||Wed, 29 Jul 2020 02:04:38 -0400|
|User-agent:||Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0|
TLDR, it's possible to make dma_blk_cb loop on itself forever with the dbs->iov.size == 0 condition. It will just keep re-scheduling dma_blk_cb over and over.
In this particular qtest reproducer, we wind up asking to map 64K at address 0xffffffff to write for the i386 machine. Somehow we manage to map 1 byte, and then 0x1000 more bytes (!?), but then we can go no further.
So, seemingly, the map command can fail in a way that will never resolve; and the dma_blk helpers mediate the callback and don't make it back to device-level code, so ide_cancel_dma_sync actually can't guarantee it cancels anything.
You can change the condition to a loop, but the DMA will reschedule itself forever, and this hangs.
What is the "reschedule" functionality here supposed to be doing? I assume we are waiting to see if a mapping succeeds later, but this mapping seems like it should never work -- how can we determine the difference between a remap that *might* work later and one that will never work?
How many times should we try to map a certain range? address_space_map warns that scheduling with cpu_register_map_client is only *likely* to allow you to succeed.
FWIW -- this bug does show up in the wild. Over the years, people have tried to report it on the launchpad, but I have never been able to reproduce it. Presumably what people are seeing are cases in which they are trying to cancel DMA, but the DMA in-progress has a mapping that fails (either temporarily or permanently) and we fail to cancel the DMA, and QEMU aborts.
 Long debugging comment with gorier details: https://bugs.launchpad.net/qemu/+bug/1681439/comments/14
|[Prev in Thread]||Current Thread||[Next in Thread]|