[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] x86_64: optimise muldiv64 for x86_64 architectu
From: |
Frediano Ziglio |
Subject: |
Re: [Qemu-devel] [PATCH] x86_64: optimise muldiv64 for x86_64 architecture |
Date: |
Fri, 9 Jan 2015 12:07:56 +0000 |
2015-01-09 11:24 GMT+00:00 Paolo Bonzini <address@hidden>:
>
>
> On 09/01/2015 12:04, Frediano Ziglio wrote:
>> 2015-01-09 10:35 GMT+00:00 Paolo Bonzini <address@hidden>:
>>>
>>>
>>> On 09/01/2015 11:27, Frediano Ziglio wrote:
>>>>
>>>> Signed-off-by: Frediano Ziglio <address@hidden>
>>>> ---
>>>> include/qemu-common.h | 13 +++++++++++++
>>>> 1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/include/qemu-common.h b/include/qemu-common.h
>>>> index f862214..5366220 100644
>>>> --- a/include/qemu-common.h
>>>> +++ b/include/qemu-common.h
>>>> @@ -370,6 +370,7 @@ static inline uint8_t from_bcd(uint8_t val)
>>>> }
>>>>
>>>> /* compute with 96 bit intermediate result: (a*b)/c */
>>>> +#ifndef __x86_64__
>>>> static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>>> {
>>>> union {
>>>> @@ -392,6 +393,18 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t
>>>> b, uint32_t c)
>>>> res.l.low = (((rh % c) << 32) + (rl & 0xffffffff)) / c;
>>>> return res.ll;
>>>> }
>>>> +#else
>>>> +static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>>> +{
>>>> + uint64_t res;
>>>> +
>>>> + asm ("mulq %2\n\tdivq %3"
>>>> + : "=a"(res)
>>>> + : "a"(a), "qm"((uint64_t) b), "qm"((uint64_t)c)
>>>> + : "rdx", "cc");
>>>> + return res;
>>>> +}
>>>> +#endif
>>>>
>>>
>>> Good idea. However, if you have __int128, you can just do
>>>
>>> return (__int128)a * b / c
>>>
>>> and the compiler should generate the right code. Conveniently, there is
>>> already CONFIG_INT128 that you can use.
>>
>> Well, it works but in our case b <= c, that is a * b / c is always <
>> 2^64.
>
> This is not necessarily the case. Quick grep:
>
> hw/timer/hpet.c: return (muldiv64(value, HPET_CLK_PERIOD, FS_PER_NS));
> hw/timer/hpet.c: return (muldiv64(value, FS_PER_NS, HPET_CLK_PERIOD));
>
> One of the two must disprove your assertion. :)
>
Unless FS_PER_NS == HPET_CLK_PERIOD :)
> But it's true that we expect no overflow.
>
This is enough!
>> This lead to no integer overflow in the last division. However
>> the compiler does not know this so it does the entire (a*b) / c
>> division which is mainly consists in two integer division instead of
>> one (not taking into account that is implemented using a helper
>> function).
>>
>> I think that I'll write two patches. One implementing using the int128
>> as you suggested (which is much easier to read that current one and
>> assembly ones) that another for x86_64 optimization.
>
> Right, that's even better.
>
> Out of curiosity, have you seen it in some profiles?
>
> Paolo
No, just looked at the complicate code and generated code and though
"why using dozen of instructions if two are enough?" :)
Frediano