|
From: | Mark Cave-Ayland |
Subject: | Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros |
Date: | Sun, 27 Jan 2019 17:45:54 +0000 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 |
On 27/01/2019 17:26, Richard Henderson wrote: > On 1/27/19 7:19 AM, Mark Cave-Ayland wrote: >> Could this make the loop slower? I certainly haven't noticed any obvious >> performance difference during testing (OS X uses merge quite a bit for >> display rendering), and I'd hope that with a good compiler and modern branch >> prediction then any effect here would be negligible. > > I would expect the i < n/2 loop to be faster, because the assignments are > unconditional. FWIW. Do you have any idea as to how much faster? Is it something that would show up as significant within the context of QEMU? As well as eliminating the HI_IDX/LO_IDX constants I do find the updated version much easier to read, so I would prefer to keep it if possible. What about unrolling the loop into 2 separate ones e.g. for (i = 0; i < ARRAY_SIZE(r->element); i+=2) { result.access(i) = a->access(i >> 1); } for (i = 1; i < ARRAY_SIZE(r->element); i+=2) { result.access(i) = b->access(i >> 1); } Would you expect this to perform better than the version proposed in the patchset? ATB, Mark.
[Prev in Thread] | Current Thread | [Next in Thread] |