[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Q: What is default architecture for code produced by native compiler
From: |
Arthur Miller |
Subject: |
Re: Q: What is default architecture for code produced by native compiler? |
Date: |
Sat, 11 Sep 2021 03:21:20 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) |
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Arthur Miller <arthur.miller@live.com>
>> Cc: martin rudalics <rudalics@gmx.at>, emacs-devel@gnu.org, akrl@sdf.org
>> Date: Thu, 26 Aug 2021 18:02:55 +0200
>>
>> I have learned that most of your personal suggestions are very sane, so I
>> would
>> be interested to learn more. If you can recommend some good benchmark I
>> would be
>> interested to do comparisons. I am not sure how to test it myself.
>
> Well, there is the scroll-through-xdisp.c benchmark we frequently see
> on this list, so maybe try that. Assuming that scrolling speed
> matters to you, that is.
>
> Another possibility is to time byte-compilation of a large enough
> file.
About this with compilation with -O2 vs -O3. I took some time tonight and tested
some. I wasn't looking for that scroll bench, but since I like to extract data
from text, I tried to improvize some, I counted words in all plato's dialogues,
names and their frequencies, and I find all occurences of one word. I also
benchmarked time to extract symbols from Emacs lisp source code.
I am not sure it that is actually any good test, but anyway, for the most you
seem to be correct about -O2 vs -O3. I am not sure if -O3 was slightly slower,
but it certainly wasn't any faster. I think that I measured mostly fluctuations
in my system. I did see slight performance increase in one case: with -Ofast
with unrolled loops and vectorization, but I don't think that the slight
performance increase justify all those unsafe optimizations.
1. (benchmark-run 5 (byte-compile-file "loaddefs.el"))
2. (benchmark-run 10 (count-words-in-dialogues))
3. (benchmark-run 10 (count-names-freq))
4. (benchmark-run 10 (count-socrates))
5. (benchmark-run (ff-build-emacs-db))
6. (benchmark-run (ff-build-package-db))
| | 1. | 2.
| 3. | 4.
| 5. | 6.
|
| -O2 | (0.07173132 0 0.0) | (3.638477067
0 0.0) | (0.995782973 29 0.286681078) | (0.6793675309999999 0
0.0) | (3.516161214 52 1.530445056) | (1.49021885 19
0.6228081910000007) |
| -O2 -march=native -mtune=native | (0.07093016799999999 0 0.0) |
(3.6305292319999998 0 0.0) | (0.986554758 29 0.249624919) |
(0.7151563839999999 0 0.0) | (3.3660707039999997 54 1.500936318) |
(1.3437769549999998 16 0.5610343889999996) |
| -O3 -march=native -mtune=native | (0.07113512100000001 0 0.0) |
(3.6112633769999998 0 0.0) | (0.981865426 29 0.24965302400000006) |
(0.7124635739999999 0 0.0) | (3.378908317 53 1.4800354379999998) |
(1.36308648 16 0.5572504489999996) |
| -Ofast ... | (0.07093933799999999 0 0.0) | (3.570912783
0 0.0) | (0.951396288 29 0.2507440930000002) | (0.67694857 0 0.0)
| (3.3648884779999997 53 1.4933518069999998) | (1.3443351749999999 16
0.5532435859999989) |
'CFLAGS=-Ofast -ftree-vectorize -mavx -march=native -mcpu=native -mtune=native
-fopt-info-vec-optimized -flto -funroll-loops
-funsafe-math-optimizations
-fno-trapping-math -fno-finite-math-only -fopenmp'
Summa summarum, seems that -O2 flag with cpu specific flags is just as fast as
-Ofast with all unsafe options, so -O2 it is.
I did a clean and make bootstrap for each version, and I tested with emacs -Q.
benchmark.el
Description: Text document
Plato.org
Description: Lotus Organizer
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: Q: What is default architecture for code produced by native compiler?,
Arthur Miller <=