|
| From: | Richard Henderson |
| Subject: | Re: [PATCH v1 03/15] tcg: Fix register allocation constraints |
| Date: | Wed, 14 Aug 2024 13:08:53 +1000 |
| User-agent: | Mozilla Thunderbird |
On 8/14/24 12:27, LIU Zhiwei wrote:
On 2024/8/14 10:04, Richard Henderson wrote:On 8/14/24 10:58, LIU Zhiwei wrote:Thus if we want to use all registers of vectors, we have to add a dynamic constraint on register allocation based on IR types.My comment vs patch 4 is that you can't do that, at least not without large changes to TCG.In addition, I said that the register pressure on vector regs is not high enough to justify such changes. There is, so far, little benefit in having more than 4 or 5 vector registers, much less 32. Thus 7 (lmul 4, omitting v0) is sufficient.At least on QEMU, SVE can support 2048 bit vector length with 'sve-default-vector- length=256'. Software optimized with SVE, such as X264 can benefit with long SVE length in less dynamic A64 instructions.We want to expose all host vector ability. Thus the largest TCG_TYPE_V256 is not enough, as 128-bit RVV can give 8*128=1024 width operation. We have expand TCG_TYPE_V512/1024/2048 types(not in this patch set, but intend to upstream later). With large TCG_TYPE_V1024/2048, we get better performance on RISC-V board with much less translated RISC-V vector instructions. We can give a more detailed experiment result if needed.However, we will only have 3 vector register when support TCG_TYPE_V1024. And even less for TCG_TYPE_V2048. Current approach will give more vectors TCG_TYPE_V128 even with support TCG_TYPE_V1024, which will relax some guest NEON register pressure.
Then you will have to teach TCG about one operand consuming and clobbering N hard registers, so that you get the spills and fills done correctly.
But you haven't done that in this patch set, so will currently generate incorrect code.I think you should make longer vector operations a longer term project, and start with something simpler.
r~
| [Prev in Thread] | Current Thread | [Next in Thread] |