|
From: | Richard Henderson |
Subject: | Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros |
Date: | Sun, 27 Jan 2019 10:07:12 -0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 |
On 1/27/19 9:45 AM, Mark Cave-Ayland wrote: >> I would expect the i < n/2 loop to be faster, because the assignments are >> unconditional. FWIW. > > Do you have any idea as to how much faster? Is it something that would show > up as significant within the context of QEMU? I don't have any numbers on that, no. > As well as eliminating the HI_IDX/LO_IDX constants I do find the updated > version much easier to read, so I would prefer to keep it if possible. > What about unrolling the loop into 2 separate ones... I doubt that would be helpful. I would think that #define VMRG_DO(name, access, ofs) ... int i, half = ARRAY_SIZE(r->access(0)) / 2; ... for (i = 0; i < half; i++) { result.access(2 * i + 0) = a->access(i + ofs); result.access(2 * i + 1) = b->access(i + ofs); } where OFS = 0 for HI and half for LO is best. I find it quite readable, and it avoids duplicating code between LO and HI as you're currently doing. r~
[Prev in Thread] | Current Thread | [Next in Thread] |