[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] x86_64: optimise muldiv64 for x86_64 architectu
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH] x86_64: optimise muldiv64 for x86_64 architecture |
Date: |
Fri, 09 Jan 2015 12:24:24 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 |
On 09/01/2015 12:04, Frediano Ziglio wrote:
> 2015-01-09 10:35 GMT+00:00 Paolo Bonzini <address@hidden>:
>>
>>
>> On 09/01/2015 11:27, Frediano Ziglio wrote:
>>>
>>> Signed-off-by: Frediano Ziglio <address@hidden>
>>> ---
>>> include/qemu-common.h | 13 +++++++++++++
>>> 1 file changed, 13 insertions(+)
>>>
>>> diff --git a/include/qemu-common.h b/include/qemu-common.h
>>> index f862214..5366220 100644
>>> --- a/include/qemu-common.h
>>> +++ b/include/qemu-common.h
>>> @@ -370,6 +370,7 @@ static inline uint8_t from_bcd(uint8_t val)
>>> }
>>>
>>> /* compute with 96 bit intermediate result: (a*b)/c */
>>> +#ifndef __x86_64__
>>> static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>> {
>>> union {
>>> @@ -392,6 +393,18 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t
>>> b, uint32_t c)
>>> res.l.low = (((rh % c) << 32) + (rl & 0xffffffff)) / c;
>>> return res.ll;
>>> }
>>> +#else
>>> +static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>> +{
>>> + uint64_t res;
>>> +
>>> + asm ("mulq %2\n\tdivq %3"
>>> + : "=a"(res)
>>> + : "a"(a), "qm"((uint64_t) b), "qm"((uint64_t)c)
>>> + : "rdx", "cc");
>>> + return res;
>>> +}
>>> +#endif
>>>
>>
>> Good idea. However, if you have __int128, you can just do
>>
>> return (__int128)a * b / c
>>
>> and the compiler should generate the right code. Conveniently, there is
>> already CONFIG_INT128 that you can use.
>
> Well, it works but in our case b <= c, that is a * b / c is always <
> 2^64.
This is not necessarily the case. Quick grep:
hw/timer/hpet.c: return (muldiv64(value, HPET_CLK_PERIOD, FS_PER_NS));
hw/timer/hpet.c: return (muldiv64(value, FS_PER_NS, HPET_CLK_PERIOD));
One of the two must disprove your assertion. :)
But it's true that we expect no overflow.
> This lead to no integer overflow in the last division. However
> the compiler does not know this so it does the entire (a*b) / c
> division which is mainly consists in two integer division instead of
> one (not taking into account that is implemented using a helper
> function).
>
> I think that I'll write two patches. One implementing using the int128
> as you suggested (which is much easier to read that current one and
> assembly ones) that another for x86_64 optimization.
Right, that's even better.
Out of curiosity, have you seen it in some profiles?
Paolo