qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] x86_64: optimise muldiv64 for x86_64 architectu


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH] x86_64: optimise muldiv64 for x86_64 architecture
Date: Fri, 09 Jan 2015 12:24:24 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0


On 09/01/2015 12:04, Frediano Ziglio wrote:
> 2015-01-09 10:35 GMT+00:00 Paolo Bonzini <address@hidden>:
>>
>>
>> On 09/01/2015 11:27, Frediano Ziglio wrote:
>>>
>>> Signed-off-by: Frediano Ziglio <address@hidden>
>>> ---
>>>  include/qemu-common.h | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/include/qemu-common.h b/include/qemu-common.h
>>> index f862214..5366220 100644
>>> --- a/include/qemu-common.h
>>> +++ b/include/qemu-common.h
>>> @@ -370,6 +370,7 @@ static inline uint8_t from_bcd(uint8_t val)
>>>  }
>>>
>>>  /* compute with 96 bit intermediate result: (a*b)/c */
>>> +#ifndef __x86_64__
>>>  static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>>  {
>>>      union {
>>> @@ -392,6 +393,18 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t 
>>> b, uint32_t c)
>>>      res.l.low = (((rh % c) << 32) + (rl & 0xffffffff)) / c;
>>>      return res.ll;
>>>  }
>>> +#else
>>> +static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>> +{
>>> +    uint64_t res;
>>> +
>>> +    asm ("mulq %2\n\tdivq %3"
>>> +         : "=a"(res)
>>> +         : "a"(a), "qm"((uint64_t) b), "qm"((uint64_t)c)
>>> +         : "rdx", "cc");
>>> +    return res;
>>> +}
>>> +#endif
>>>
>>
>> Good idea.  However, if you have __int128, you can just do
>>
>>    return (__int128)a * b / c
>>
>> and the compiler should generate the right code.  Conveniently, there is
>> already CONFIG_INT128 that you can use.
> 
> Well, it works but in our case b <= c, that is a * b / c is always <
> 2^64.

This is not necessarily the case.  Quick grep:

hw/timer/hpet.c:    return (muldiv64(value, HPET_CLK_PERIOD, FS_PER_NS));
hw/timer/hpet.c:    return (muldiv64(value, FS_PER_NS, HPET_CLK_PERIOD));

One of the two must disprove your assertion. :)

But it's true that we expect no overflow.

> This lead to no integer overflow in the last division. However
> the compiler does not know this so it does the entire (a*b) / c
> division which is mainly consists in two integer division instead of
> one (not taking into account that is implemented using a helper
> function).
> 
> I think that I'll write two patches. One implementing using the int128
> as you suggested (which is much easier to read that current one and
> assembly ones) that another for x86_64 optimization.

Right, that's even better.

Out of curiosity, have you seen it in some profiles?

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]