[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug in xtime.h
From: |
Bruno Haible |
Subject: |
Re: bug in xtime.h |
Date: |
Mon, 23 Dec 2019 07:18:03 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-166-generic; KDE/5.18.0; x86_64; ; ) |
Hi Paul,
> > xtime_sec (xtime_t t)
> > {
> > return (t < 0
> > - ? (t + XTIME_PRECISION - 1) / XTIME_PRECISION - 1
> > + ? (t + 1) / XTIME_PRECISION - 1
> > : xtime_nonnegative_sec (t));
>
> Thanks for pointing out the bug. We can simplify the fix further (and speed it
> up a bit on typical hosts).
While I like the code you installed - it is simpler than the one I proposed -,
I must point out that it's hard to predict what speed characteristics
"typical hosts" will show. When I compile this file with gcc-9.2.0 -O2 -S
(or similarly with clang)
================================================================
long long sec1 (long long t)
{ return (t < 0 ? (t + 1) / 1000000000 - 1 : t / 1000000000); }
long long sec2 (long long t)
{ return t / 1000000000 - (t % 1000000000 < 0); }
================================================================
I get this assembly code:
sec1:
testq %rdi, %rdi
js .L5
movabsq $1237940039285380275, %rdx
movq %rdi, %rax
sarq $63, %rdi
imulq %rdx
movq %rdx, %rax
sarq $26, %rax
subq %rdi, %rax
ret
.L5:
movabsq $1237940039285380275, %rdx
addq $1, %rdi
movq %rdi, %rax
sarq $63, %rdi
imulq %rdx
sarq $26, %rdx
subq %rdi, %rdx
leaq -1(%rdx), %rax
ret
sec2:
movabsq $1237940039285380275, %rdx
movq %rdi, %rax
imulq %rdx
movq %rdx, %rax
movq %rdi, %rdx
sarq $63, %rdx
sarq $26, %rax
subq %rdx, %rax
imulq $1000000000, %rax, %rdx
subq %rdx, %rdi
shrq $63, %rdi
subq %rdi, %rax
ret
Similarly with clang 9:
sec1:
movq %rdi, %rax
testq %rdi, %rdi
js .LBB0_1
shrq $9, %rax
movabsq $19342813113834067, %rcx
mulq %rcx
movq %rdx, %rax
shrq $11, %rax
retq
.LBB0_1:
addq $1, %rax
movabsq $1237940039285380275, %rcx
imulq %rcx
movq %rdx, %rax
shrq $63, %rax
sarq $26, %rdx
addq %rdx, %rax
addq $-1, %rax
retq
sec2:
movabsq $1237940039285380275, %rcx
movq %rdi, %rax
imulq %rcx
movq %rdx, %rax
shrq $63, %rax
sarq $26, %rdx
addq %rax, %rdx
imulq $1000000000, %rdx, %rax
subq %rax, %rdi
sarq $63, %rdi
leaq (%rdi,%rdx), %rax
retq
So, sec1 has one more conditional jump, whereas sec2 has one more 64-bit
multiplication instruction in its path. How well will the branch
prediction unit be able to optimize the conditional jump?
=================================================================
#include <stdlib.h>
static inline long long sec1 (long long t)
{ return (t < 0 ? (t + 1) / 1000000000 - 1 : t / 1000000000); }
static inline long long sec2 (long long t)
{ return t / 1000000000 - (t % 1000000000 < 0); }
volatile long long t = 1576800000000000000LL;
volatile long long x;
int
main (int argc, char *argv[])
{
int repeat = atoi (argv[1]);
int i;
for (i = repeat; i > 0; i--)
x = sec1 (t); // or sec2 (t)
}
=================================================================
Results (compiled each with -O2, ran with argument 1000000000,
on an Intel Core m3 CPU):
gcc clang
sec1 1.28 ns 1.04 ns
sec2 1.78 ns 1.78 ns
And on sparc64:
gcc
sec1 7.79 ns
sec2 8.06 ns
And on aarch64:
gcc
sec1 27.5 ns
sec2 55.0 ns
Hmm...
Again: I'm not asking to optimize this particular function. Simply,
from time to time, I like to question the assumptions we make about
the compiler and about "typical hosts".
Bruno
- bug in xtime.h, Bruno Haible, 2019/12/22
- Re: bug in xtime.h, Paul Eggert, 2019/12/22
- Re: bug in xtime.h, Bruno Haible, 2019/12/22
- Re: bug in xtime.h,
Bruno Haible <=
- Re: bug in xtime.h, Paul Eggert, 2019/12/23
- Re: bug in xtime.h, Bruno Haible, 2019/12/23
- Re: bug in xtime.h, Paul Eggert, 2019/12/24
- Re: bug in xtime.h, Bruno Haible, 2019/12/24
- Re: bug in xtime.h, Paul Eggert, 2019/12/24
Re: bug in xtime.h, Akim Demaille, 2019/12/25