|
| From: | Richard Henderson |
| Subject: | Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros |
| Date: | Sun, 27 Jan 2019 10:07:12 -0800 |
| User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 |
On 1/27/19 9:45 AM, Mark Cave-Ayland wrote:
>> I would expect the i < n/2 loop to be faster, because the assignments are
>> unconditional. FWIW.
>
> Do you have any idea as to how much faster? Is it something that would show
> up as significant within the context of QEMU?
I don't have any numbers on that, no.
> As well as eliminating the HI_IDX/LO_IDX constants I do find the updated
> version much easier to read, so I would prefer to keep it if possible.
> What about unrolling the loop into 2 separate ones...
I doubt that would be helpful.
I would think that
#define VMRG_DO(name, access, ofs)
...
int i, half = ARRAY_SIZE(r->access(0)) / 2;
...
for (i = 0; i < half; i++) {
result.access(2 * i + 0) = a->access(i + ofs);
result.access(2 * i + 1) = b->access(i + ofs);
}
where OFS = 0 for HI and half for LO is best. I find it quite readable, and it
avoids duplicating code between LO and HI as you're currently doing.
r~
| [Prev in Thread] | Current Thread | [Next in Thread] |