[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for

From: Cédric Le Goater
Subject: Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
Date: Tue, 27 Apr 2021 16:32:14 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1


On 4/27/21 10:54 AM, Francisco Iglesias wrote:
> On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote:
>> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
>>>> <frasse.iglesias@gmail.com> wrote:
>>>>> Hi Bin,
>>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
>>>>>> Hi Francisco,
>>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>> Dear Bin,
>>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
>>>>>>>> Hi Francisco,
>>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>> Hi Bin,
>>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
>>>>>>>>>> Hi Francisco,
>>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>> Hi Bin,
>>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
>>>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
>>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
>>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com>
>>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many 
>>>>>>>>>>>>>>>> follow-up
>>>>>>>>>>>>>>>> bytes are expected to be received after it receives a command. 
>>>>>>>>>>>>>>>> For
>>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte address 
>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>> 4-byte address is needed.
>>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are required 
>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be 
>>>>>>>>>>>>>>>> counted
>>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began.
>>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is in 
>>>>>>>>>>>>>>>> byte.
>>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the model 
>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. The 
>>>>>>>>>>>>>>>> right
>>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes 
>>>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast 
>>>>>>>>>>>>>>>> Read Quad
>>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 * 
>>>>>>>>>>>>>>>> 4 / 8).
>>>>>>>>>>>>>>> While not being the original implementor I must assume that 
>>>>>>>>>>>>>>> above solution was
>>>>>>>>>>>>>>> considered but not chosen by the developers due to it is 
>>>>>>>>>>>>>>> inaccuracy (it
>>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a 
>>>>>>>>>>>>>>> multiple of 8,
>>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to 
>>>>>>>>>>>>>>> generate 7 the error
>>>>>>>>>>>>>>> wouldn't be caught and the controller will still be considered 
>>>>>>>>>>>>>>> "correct"). Now
>>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor of 
>>>>>>>>>>>>>>> keeping it, this
>>>>>>>>>>>>>>> also because the detail is already in use for catching exactly 
>>>>>>>>>>>>>>> above error.
>>>>>>>>>>>>>> I found no clue from the commit message that my proposed 
>>>>>>>>>>>>>> solution here
>>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models 
>>>>>>>>>>>>>> supporting
>>>>>>>>>>>>>> software generation should have been found out seriously broken 
>>>>>>>>>>>>>> long
>>>>>>>>>>>>>> time ago!
>>>>>>>>>>>>> The controllers you are referring to might lack support for 
>>>>>>>>>>>>> commands requiring
>>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other 
>>>>>>>>>>>>> commands? If so I
>>>>>>>>>>>> I am not sure why you view dummy clock cycles as something special
>>>>>>>>>>>> that needs some special support from the SPI controller. For the 
>>>>>>>>>>>> case
>>>>>>>>>>>> 1 controller, it's nothing special from the controller perspective,
>>>>>>>>>>>> just like sending out a command, or address bytes, or data. The
>>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and that's 
>>>>>>>>>>>> it.
>>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either be
>>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, or
>>>>>>>>>>>> automatically generated (case 2 controller) by the hardware.
>>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For 
>>>>>>>>>>> that we also
>>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that runs 
>>>>>>>>>>> on a HW
>>>>>>>>>>> board supported in QEMU should ideally run on that board inside 
>>>>>>>>>>> QEMU aswell
>>>>>>>>>>> (this can be a bare metal application equaly well as a modified 
>>>>>>>>>>> u-boot/Linux
>>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock 
>>>>>>>>>>> cycles).
>>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy to 
>>>>>>>>>>> know which
>>>>>>>>>>> intentional or untentional features provided by the functionality 
>>>>>>>>>>> are being
>>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm 
>>>>>>>>>>> aware of that
>>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle 
>>>>>>>>>>> modeling inside
>>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the 
>>>>>>>>>>> dummy clock
>>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number of 
>>>>>>>>>>> dummy clock
>>>>>>>>>>> cycles), but there might be others aswell. So by removing this 
>>>>>>>>>>> functionality
>>>>>>>>>>> above use case will brake, this since those test will not be 
>>>>>>>>>>> reliable.
>>>>>>>>>>> Furthermore, since users tend to be creative it is not possible to 
>>>>>>>>>>> know if
>>>>>>>>>>> there are other use cases that will be affected. This means that in 
>>>>>>>>>>> case [1]
>>>>>>>>>>> needs to be followed the safe path is to add functionality instead 
>>>>>>>>>>> of removing.
>>>>>>>>>>> Luckily it also easier in this case, see below.
>>>>>>>>>> I understand there might be users other than U-Boot/Linux that use an
>>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was
>>>>>>>>>> about model behavior changes, sure I can update
>>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the
>>>>>>>>>> m25p80 model now implement dummy cycles as bytes.
>>>>>>>>> Yes, something like that. My concern is that since this functionality 
>>>>>>>>> has been
>>>>>>>>> in tree for while, users have found known or unknown features that got
>>>>>>>>> introduced by it. By removing the functionality (and the known/uknown 
>>>>>>>>> features)
>>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware of 
>>>>>>>>> one
>>>>>>>>> feature/use case but it is not unlikely that there are more). [1] 
>>>>>>>>> states that
>>>>>>>>> "In general features are intended to be supported indefinitely once 
>>>>>>>>> introduced
>>>>>>>>> into QEMU", to me that makes very much sense because the opposite 
>>>>>>>>> would mean
>>>>>>>>> that we were not reliable. So in case [1] needs to be honored it 
>>>>>>>>> looks to be
>>>>>>>>> safer to add functionality instead of removing (and riscing the 
>>>>>>>>> removal of use
>>>>>>>>> cases/features). Luckily I still believe in this case that it will be 
>>>>>>>>> easier to
>>>>>>>>> go forward (even if I also agree on what you are saying below about 
>>>>>>>>> what I
>>>>>>>>> proposed).
>>>>>>>> Even if the implementation is buggy and we need to keep the buggy
>>>>>>>> implementation forever? I think that's why
>>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such
>>>>>>>> feature.
>>>>>>> With the RFC I posted all commands in m25p80 are working for both the 
>>>>>>> case 1
>>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as 
>>>>>>> GQSPI).
>>>>>>> Because of this, I, with all respect, will have to disagree that this 
>>>>>>> is buggy.
>>>>>> Well, the existing m25p80 implementation that uses dummy cycle
>>>>>> accuracy for those flashes prevents all SPI controllers that use tx
>>>>>> fifo to work with those flashes. Hence it is buggy.
>>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and else 
>>>>>>>>>>>>> we should
>>>>>>>>>>>>> probably let the maintainers know about it). Most likely the lack 
>>>>>>>>>>>>> of support
>>>>>>>>>>>> I called it "seriously broken" because current implementation only
>>>>>>>>>>>> considered one type of SPI controllers while completely ignoring 
>>>>>>>>>>>> the
>>>>>>>>>>>> other type.
>>>>>>>>>>> If we change view and see this from the perspective of m25p80, it 
>>>>>>>>>>> models the
>>>>>>>>>>> commands a certain way and provides an API that the SPI controllers 
>>>>>>>>>>> need to
>>>>>>>>>>> implement for interacting with it. It is true that there are SPI 
>>>>>>>>>>> controllers
>>>>>>>>>>> referred to above that do not support the portion of that API that 
>>>>>>>>>>> corresponds
>>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true 
>>>>>>>>>>> that this is
>>>>>>>>>>> broken since there is also one SPI controller that has a working 
>>>>>>>>>>> implementation
>>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use 
>>>>>>>>>>> case 1). But
>>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to 
>>>>>>>>>>> m25p80's API
>>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to 
>>>>>>>>>>> dummy bytes [1]
>>>>>>>>>>> will still be honored as in the same time making it possible to 
>>>>>>>>>>> have full
>>>>>>>>>>> support for the API in the SPI controllers that currently do not 
>>>>>>>>>>> (please reread
>>>>>>>>>>> the proposal in my previous reply that attempts to do this). I 
>>>>>>>>>>> myself see this
>>>>>>>>>>> as win/win situation, also because no controller should need 
>>>>>>>>>>> modifications.
>>>>>>>>>> I am afraid your proposal does not work. Your proposed new device
>>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate dummy
>>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify as
>>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled to
>>>>>>>>>> how the SPI controller works.
>>>>>>>>> I agree on above. I decided though that instead of posting sample 
>>>>>>>>> code in here
>>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. 
>>>>>>>>> About below,
>>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step.
>>>>>>>> Wait, (see below)
>>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports 
>>>>>>>>>> both
>>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or
>>>>>>>>>> generated by the controller automatically. Please read the example
>>>>>>>>>> given in:
>>>>>>>>>>     table 24‐22, an example of Generic FIFO Contents for Quad I/O 
>>>>>>>>>> Read
>>>>>>>>>> Command (EBh)
>>>>>>>>>> in 
>>>>>>>>>> https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
>>>>>>>>>> If you choose to set the m25p80 device property 'model_dummy_bytes' 
>>>>>>>>>> to
>>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound to
>>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy 
>>>>>>>>>> cycles,
>>>>>>>>>> and this is wrong.
>>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above
>>>>>>>> your proposal cannot support the complicated controller like Xilinx
>>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you
>>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy cycle
>>>>>>>> generation, which is wrong.
>>>>>>> First, thank you very much for looking into the RFC series, very much
>>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers 
>>>>>>> from 2
>>>>>>> locations in the file, in 1 location the transfer referred to above is 
>>>>>>> done, in
>>>>>>> another location the transfer through the txfifo is done. The location 
>>>>>>> where
>>>>>>> transfer referred to above is done will not need any modifications (and 
>>>>>>> will
>>>>>>> thus work equally well as it does currently).
>>>>>> Please explain this a little bit. How does your RFC series handle
>>>>>> cases as described in table 24-22, where the 6 dummy cycles are split
>>>>>> into 2 transfers, with one transfer using tx fifo, and the other one
>>>>>> using hardware dummy cycle generation?
>>>>> Sorry, I missunderstod. You are right, that won't work.
>>>> +Edgar E. Iglesias
>>>> So it looks by far the only way to implement dummy cycles correctly to
>>>> work with all SPI controller models is what I proposed here in this
>>>> patch series.
>>>> Maintainers are quite silent, so I would like to hear your thoughts.
>>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
>>>> please share your thoughts since you are the one who reviewed the
>>>> existing dummy implementation (based on commits history)
>> I agree with Edgar, in that Francisco and Bin know this better than me
>> and that modelling things in cycles is a pain.
> Hi Alistair,
>> As Bin points out it seems like currently we should be modelling bytes
>> (from the variable name) so it makes sense to keep it in bytes. I
>> would be in favour of this series in that case. Do we know what use
>> cases this will break? I know it's hard to answer but I don't think
>> there are too many SSI users in QEMU so it might not be too hard to
>> test most of the possible use cases.
> The use case I'm aware of is regression testing of drivers. Ex: if a
> driver is using 10 dummy clock cycles with the commands and a patch
> accidentaly changes the driver to use 11 dummy clock cycles QEMU currently
> finds the problem, that won't be possible with this series. It's difficult
> to say but it is not impossible there are other use cases also.

It was breaking the Aspeed machines :


QEMU 6.1 should have acceptance tests that will help in detecting
regressions in this area.



> More importantly IMO though is that the current use cases can be keept
> while still providing support for commands with dummy clock cycles into
> the QEMU SPI controllers lacking at the moment.
> (If I recall correctly this series might also have another issue regarding
> the GQSPI SPI mode configuration, with that it is possible transmit 8
> dummy clock cycles as 1 data byte, 2 data bytes or 4 data bytes, so I
> think some form of calculation might be needed inside m25p80).
> Best regards,
> Francisco
>> Alistair
>>> Hello maintainers,
>>> We apparently missed the 6.0 window to address this mess of the m25p80
>>> model. Please provide your inputs on this before I start working on
>>> the v2.
>>> Regards,
>>> Bin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]