qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1


From: Richard Henderson
Subject: Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1
Date: Tue, 5 Jul 2022 16:51:44 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1

On 7/5/22 16:18, Peter Maydell wrote:
Ah yes, I see how this works. I wonder if there's some way we
can abstract out this sort of index calculation into a macro
or function so that we can comment what it's doing there and
then all the use-sites are more "obviously correct". Perhaps:

/*
  * When considering the ZA storage as an array of elements of
  * type T, the index within that array of the Nth element of
  * a vertical slice of a tile can be calculated like this,
  * regardless of the size of type T. This is because the tiles
  * are interleaved, so if type T is size N bytes then row 1 of
  * the tile is N rows away from row 0. The division by N to
  * convert a byte offset into an array index and the multiplication
  * by N to convert from vslice-index-within-the-tile to
  * the index within the ZA storage cancel out.
  */
#define tile_vslice_index(i) ((i) * sizeof(ARMVectorReg))

/*
  * When doing byte arithmetic on the ZA storage, the element
  * byteoff bytes away in a tile vertical slice is always this
  * many bytes away in the ZA storage, regardless of the
  * size of the tile element, assuming that byteoff is a multiple
  * of the element size. Again this is because of the interleaving
  * of the tiles. For instance if we have 1 byte per element then
  * each row of the ZA storage has one byte of the vslice data,
  * and (counting from 0) byte 8 goes in row 8 of the storage
  * at offset (8 * row-size-in-bytes).
  * If we have 8 bytes per element then each row of the ZA storage
  * has 8 bytes of the data, but there are 8 interleaved tiles and
  * so byte 8 of the data goes into row 1 of the tile,
  * which is again row 8 of the storage, so the offset is still
  * (8 * row-size-in-bytes). Similarly for other element sizes.
  */
#define tile_vslice_offset(byteoff) ((byteoff) * sizeof(ARMVectorReg))

(or use functions if you like. Maybe we want versions that
take (row,col) arguments too.)

That seems reasonable.  I'll work this into v5.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]