qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for


From: Bin Meng
Subject: Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
Date: Wed, 28 Apr 2021 21:12:11 +0800

Hi Cédric,

On Tue, Apr 27, 2021 at 10:32 PM Cédric Le Goater <clg@kaod.org> wrote:
>
> Hello,
>
> On 4/27/21 10:54 AM, Francisco Iglesias wrote:
> > On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote:
> >> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >>>
> >>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >>>>
> >>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> >>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>
> >>>>> Hi Bin,
> >>>>>
> >>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> >>>>>> Hi Francisco,
> >>>>>>
> >>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> >>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Dear Bin,
> >>>>>>>
> >>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> >>>>>>>> Hi Francisco,
> >>>>>>>>
> >>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> >>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Bin,
> >>>>>>>>>
> >>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> >>>>>>>>>> Hi Francisco,
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> >>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>
> >>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> >>>>>>>>>>>> Hi Francisco,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> >>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> >>>>>>>>>>>>>> Hi Francisco,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> >>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> >>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many 
> >>>>>>>>>>>>>>>> follow-up
> >>>>>>>>>>>>>>>> bytes are expected to be received after it receives a 
> >>>>>>>>>>>>>>>> command. For
> >>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte 
> >>>>>>>>>>>>>>>> address or
> >>>>>>>>>>>>>>>> 4-byte address is needed.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are 
> >>>>>>>>>>>>>>>> required after
> >>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be 
> >>>>>>>>>>>>>>>> counted
> >>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is 
> >>>>>>>>>>>>>>>> in byte.
> >>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the 
> >>>>>>>>>>>>>>>> model has
> >>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. 
> >>>>>>>>>>>>>>>> The right
> >>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes 
> >>>>>>>>>>>>>>>> based on
> >>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast 
> >>>>>>>>>>>>>>>> Read Quad
> >>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 
> >>>>>>>>>>>>>>>> * 4 / 8).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> While not being the original implementor I must assume that 
> >>>>>>>>>>>>>>> above solution was
> >>>>>>>>>>>>>>> considered but not chosen by the developers due to it is 
> >>>>>>>>>>>>>>> inaccuracy (it
> >>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a 
> >>>>>>>>>>>>>>> multiple of 8,
> >>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to 
> >>>>>>>>>>>>>>> generate 7 the error
> >>>>>>>>>>>>>>> wouldn't be caught and the controller will still be 
> >>>>>>>>>>>>>>> considered "correct"). Now
> >>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor 
> >>>>>>>>>>>>>>> of keeping it, this
> >>>>>>>>>>>>>>> also because the detail is already in use for catching 
> >>>>>>>>>>>>>>> exactly above error.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I found no clue from the commit message that my proposed 
> >>>>>>>>>>>>>> solution here
> >>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models 
> >>>>>>>>>>>>>> supporting
> >>>>>>>>>>>>>> software generation should have been found out seriously 
> >>>>>>>>>>>>>> broken long
> >>>>>>>>>>>>>> time ago!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The controllers you are referring to might lack support for 
> >>>>>>>>>>>>> commands requiring
> >>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other 
> >>>>>>>>>>>>> commands? If so I
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am not sure why you view dummy clock cycles as something 
> >>>>>>>>>>>> special
> >>>>>>>>>>>> that needs some special support from the SPI controller. For the 
> >>>>>>>>>>>> case
> >>>>>>>>>>>> 1 controller, it's nothing special from the controller 
> >>>>>>>>>>>> perspective,
> >>>>>>>>>>>> just like sending out a command, or address bytes, or data. The
> >>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and 
> >>>>>>>>>>>> that's it.
> >>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either 
> >>>>>>>>>>>> be
> >>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, 
> >>>>>>>>>>>> or
> >>>>>>>>>>>> automatically generated (case 2 controller) by the hardware.
> >>>>>>>>>>>
> >>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For 
> >>>>>>>>>>> that we also
> >>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that 
> >>>>>>>>>>> runs on a HW
> >>>>>>>>>>> board supported in QEMU should ideally run on that board inside 
> >>>>>>>>>>> QEMU aswell
> >>>>>>>>>>> (this can be a bare metal application equaly well as a modified 
> >>>>>>>>>>> u-boot/Linux
> >>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock 
> >>>>>>>>>>> cycles).
> >>>>>>>>>>>
> >>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy 
> >>>>>>>>>>> to know which
> >>>>>>>>>>> intentional or untentional features provided by the functionality 
> >>>>>>>>>>> are being
> >>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm 
> >>>>>>>>>>> aware of that
> >>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle 
> >>>>>>>>>>> modeling inside
> >>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the 
> >>>>>>>>>>> dummy clock
> >>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number 
> >>>>>>>>>>> of dummy clock
> >>>>>>>>>>> cycles), but there might be others aswell. So by removing this 
> >>>>>>>>>>> functionality
> >>>>>>>>>>> above use case will brake, this since those test will not be 
> >>>>>>>>>>> reliable.
> >>>>>>>>>>> Furthermore, since users tend to be creative it is not possible 
> >>>>>>>>>>> to know if
> >>>>>>>>>>> there are other use cases that will be affected. This means that 
> >>>>>>>>>>> in case [1]
> >>>>>>>>>>> needs to be followed the safe path is to add functionality 
> >>>>>>>>>>> instead of removing.
> >>>>>>>>>>> Luckily it also easier in this case, see below.
> >>>>>>>>>>
> >>>>>>>>>> I understand there might be users other than U-Boot/Linux that use 
> >>>>>>>>>> an
> >>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was
> >>>>>>>>>> about model behavior changes, sure I can update
> >>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the
> >>>>>>>>>> m25p80 model now implement dummy cycles as bytes.
> >>>>>>>>>
> >>>>>>>>> Yes, something like that. My concern is that since this 
> >>>>>>>>> functionality has been
> >>>>>>>>> in tree for while, users have found known or unknown features that 
> >>>>>>>>> got
> >>>>>>>>> introduced by it. By removing the functionality (and the 
> >>>>>>>>> known/uknown features)
> >>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware 
> >>>>>>>>> of one
> >>>>>>>>> feature/use case but it is not unlikely that there are more). [1] 
> >>>>>>>>> states that
> >>>>>>>>> "In general features are intended to be supported indefinitely once 
> >>>>>>>>> introduced
> >>>>>>>>> into QEMU", to me that makes very much sense because the opposite 
> >>>>>>>>> would mean
> >>>>>>>>> that we were not reliable. So in case [1] needs to be honored it 
> >>>>>>>>> looks to be
> >>>>>>>>> safer to add functionality instead of removing (and riscing the 
> >>>>>>>>> removal of use
> >>>>>>>>> cases/features). Luckily I still believe in this case that it will 
> >>>>>>>>> be easier to
> >>>>>>>>> go forward (even if I also agree on what you are saying below about 
> >>>>>>>>> what I
> >>>>>>>>> proposed).
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Even if the implementation is buggy and we need to keep the buggy
> >>>>>>>> implementation forever? I think that's why
> >>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such
> >>>>>>>> feature.
> >>>>>>>
> >>>>>>> With the RFC I posted all commands in m25p80 are working for both the 
> >>>>>>> case 1
> >>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as 
> >>>>>>> GQSPI).
> >>>>>>> Because of this, I, with all respect, will have to disagree that this 
> >>>>>>> is buggy.
> >>>>>>
> >>>>>> Well, the existing m25p80 implementation that uses dummy cycle
> >>>>>> accuracy for those flashes prevents all SPI controllers that use tx
> >>>>>> fifo to work with those flashes. Hence it is buggy.
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and 
> >>>>>>>>>>>>> else we should
> >>>>>>>>>>>>> probably let the maintainers know about it). Most likely the 
> >>>>>>>>>>>>> lack of support
> >>>>>>>>>>>>
> >>>>>>>>>>>> I called it "seriously broken" because current implementation 
> >>>>>>>>>>>> only
> >>>>>>>>>>>> considered one type of SPI controllers while completely ignoring 
> >>>>>>>>>>>> the
> >>>>>>>>>>>> other type.
> >>>>>>>>>>>
> >>>>>>>>>>> If we change view and see this from the perspective of m25p80, it 
> >>>>>>>>>>> models the
> >>>>>>>>>>> commands a certain way and provides an API that the SPI 
> >>>>>>>>>>> controllers need to
> >>>>>>>>>>> implement for interacting with it. It is true that there are SPI 
> >>>>>>>>>>> controllers
> >>>>>>>>>>> referred to above that do not support the portion of that API 
> >>>>>>>>>>> that corresponds
> >>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true 
> >>>>>>>>>>> that this is
> >>>>>>>>>>> broken since there is also one SPI controller that has a working 
> >>>>>>>>>>> implementation
> >>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use 
> >>>>>>>>>>> case 1). But
> >>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to 
> >>>>>>>>>>> m25p80's API
> >>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to 
> >>>>>>>>>>> dummy bytes [1]
> >>>>>>>>>>> will still be honored as in the same time making it possible to 
> >>>>>>>>>>> have full
> >>>>>>>>>>> support for the API in the SPI controllers that currently do not 
> >>>>>>>>>>> (please reread
> >>>>>>>>>>> the proposal in my previous reply that attempts to do this). I 
> >>>>>>>>>>> myself see this
> >>>>>>>>>>> as win/win situation, also because no controller should need 
> >>>>>>>>>>> modifications.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I am afraid your proposal does not work. Your proposed new device
> >>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate 
> >>>>>>>>>> dummy
> >>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify 
> >>>>>>>>>> as
> >>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled 
> >>>>>>>>>> to
> >>>>>>>>>> how the SPI controller works.
> >>>>>>>>>
> >>>>>>>>> I agree on above. I decided though that instead of posting sample 
> >>>>>>>>> code in here
> >>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. 
> >>>>>>>>> About below,
> >>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Wait, (see below)
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports 
> >>>>>>>>>> both
> >>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or
> >>>>>>>>>> generated by the controller automatically. Please read the example
> >>>>>>>>>> given in:
> >>>>>>>>>>
> >>>>>>>>>>     table 24‐22, an example of Generic FIFO Contents for Quad I/O 
> >>>>>>>>>> Read
> >>>>>>>>>> Command (EBh)
> >>>>>>>>>>
> >>>>>>>>>> in 
> >>>>>>>>>> https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> >>>>>>>>>>
> >>>>>>>>>> If you choose to set the m25p80 device property 
> >>>>>>>>>> 'model_dummy_bytes' to
> >>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound 
> >>>>>>>>>> to
> >>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy 
> >>>>>>>>>> cycles,
> >>>>>>>>>> and this is wrong.
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above
> >>>>>>>> your proposal cannot support the complicated controller like Xilinx
> >>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you
> >>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy 
> >>>>>>>> cycle
> >>>>>>>> generation, which is wrong.
> >>>>>>>>
> >>>>>>>
> >>>>>>> First, thank you very much for looking into the RFC series, very much
> >>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers 
> >>>>>>> from 2
> >>>>>>> locations in the file, in 1 location the transfer referred to above 
> >>>>>>> is done, in
> >>>>>>> another location the transfer through the txfifo is done. The 
> >>>>>>> location where
> >>>>>>> transfer referred to above is done will not need any modifications 
> >>>>>>> (and will
> >>>>>>> thus work equally well as it does currently).
> >>>>>>
> >>>>>> Please explain this a little bit. How does your RFC series handle
> >>>>>> cases as described in table 24-22, where the 6 dummy cycles are split
> >>>>>> into 2 transfers, with one transfer using tx fifo, and the other one
> >>>>>> using hardware dummy cycle generation?
> >>>>>
> >>>>> Sorry, I missunderstod. You are right, that won't work.
> >>>>
> >>>> +Edgar E. Iglesias
> >>>>
> >>>> So it looks by far the only way to implement dummy cycles correctly to
> >>>> work with all SPI controller models is what I proposed here in this
> >>>> patch series.
> >>>>
> >>>> Maintainers are quite silent, so I would like to hear your thoughts.
> >>>>
> >>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
> >>>> please share your thoughts since you are the one who reviewed the
> >>>> existing dummy implementation (based on commits history)
> >>
> >> I agree with Edgar, in that Francisco and Bin know this better than me
> >> and that modelling things in cycles is a pain.
> >
> > Hi Alistair,
> >
> >>
> >> As Bin points out it seems like currently we should be modelling bytes
> >> (from the variable name) so it makes sense to keep it in bytes. I
> >> would be in favour of this series in that case. Do we know what use
> >> cases this will break? I know it's hard to answer but I don't think
> >> there are too many SSI users in QEMU so it might not be too hard to
> >> test most of the possible use cases.
> >
> > The use case I'm aware of is regression testing of drivers. Ex: if a
> > driver is using 10 dummy clock cycles with the commands and a patch
> > accidentaly changes the driver to use 11 dummy clock cycles QEMU currently
> > finds the problem, that won't be possible with this series. It's difficult
> > to say but it is not impossible there are other use cases also.
>
>
> It was breaking the Aspeed machines :
>
>   
> https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf531@kaod.org/

Yes, as I mentioned in the series the modification was based on a pure
guess from existing QEMU codes as I could not find a datasheet of the
Aspeed SPI controller on the internet. Do you know if this is publicly
available?

>
> QEMU 6.1 should have acceptance tests that will help in detecting
> regressions in this area.
>

Regards,
Bin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]