emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Q: What is default architecture for code produced by native compiler


From: Arthur Miller
Subject: Re: Q: What is default architecture for code produced by native compiler?
Date: Sat, 11 Sep 2021 03:21:20 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Arthur Miller <arthur.miller@live.com>
>> Cc: martin rudalics <rudalics@gmx.at>,  emacs-devel@gnu.org,  akrl@sdf.org
>> Date: Thu, 26 Aug 2021 18:02:55 +0200
>> 
>> I have learned that most of your personal suggestions are very sane, so I 
>> would
>> be interested to learn more. If you can recommend some good benchmark I 
>> would be
>> interested to do comparisons. I am not sure how to test it myself.
>
> Well, there is the scroll-through-xdisp.c benchmark we frequently see
> on this list, so maybe try that.  Assuming that scrolling speed
> matters to you, that is.
>
> Another possibility is to time byte-compilation of a large enough
> file.

About this with compilation with -O2 vs -O3. I took some time tonight and tested
some. I wasn't looking for that scroll bench, but since I like to extract data
from text, I tried to improvize some, I counted words in all plato's dialogues,
names and their frequencies, and I find all occurences of one word. I also
benchmarked time to extract symbols from Emacs lisp source code.

I am not sure it that is actually any good test, but anyway, for the most you
seem to be correct about -O2 vs -O3. I am not sure if -O3 was slightly slower,
but it certainly wasn't any faster. I think that I measured mostly fluctuations
in my system. I did see slight performance increase in one case: with -Ofast
with unrolled loops and vectorization, but I don't think that the slight
performance increase justify all those unsafe optimizations.

1. (benchmark-run 5 (byte-compile-file "loaddefs.el"))
2. (benchmark-run 10 (count-words-in-dialogues))
3. (benchmark-run 10 (count-names-freq))
4. (benchmark-run 10 (count-socrates))
5. (benchmark-run (ff-build-emacs-db))
6. (benchmark-run (ff-build-package-db))

|                                 | 1.                          | 2.            
             | 3.                                   | 4.                        
 | 5.                                         | 6.                              
           |
| -O2                             | (0.07173132 0 0.0)          | (3.638477067 
0 0.0)        | (0.995782973 29 0.286681078)         | (0.6793675309999999 0 
0.0) | (3.516161214 52 1.530445056)               | (1.49021885 19 
0.6228081910000007)         |
| -O2 -march=native -mtune=native | (0.07093016799999999 0 0.0) | 
(3.6305292319999998 0 0.0) | (0.986554758 29 0.249624919)         | 
(0.7151563839999999 0 0.0) | (3.3660707039999997 54 1.500936318)        | 
(1.3437769549999998 16 0.5610343889999996) |
| -O3 -march=native -mtune=native | (0.07113512100000001 0 0.0) | 
(3.6112633769999998 0 0.0) | (0.981865426 29 0.24965302400000006) | 
(0.7124635739999999 0 0.0) | (3.378908317 53 1.4800354379999998)        | 
(1.36308648 16 0.5572504489999996)         |
| -Ofast ...                      | (0.07093933799999999 0 0.0) | (3.570912783 
0 0.0)        | (0.951396288 29 0.2507440930000002)  | (0.67694857 0 0.0)       
  | (3.3648884779999997 53 1.4933518069999998) | (1.3443351749999999 16 
0.5532435859999989) |
   
'CFLAGS=-Ofast -ftree-vectorize -mavx -march=native -mcpu=native -mtune=native
        -fopt-info-vec-optimized -flto -funroll-loops 
-funsafe-math-optimizations
        -fno-trapping-math -fno-finite-math-only -fopenmp'


Summa summarum, seems that -O2 flag with cpu specific flags is just as fast as
-Ofast with all unsafe options, so -O2 it is.

I did a clean and make bootstrap for each version, and I tested with emacs -Q.

Attachment: benchmark.el
Description: Text document

Attachment: Plato.org
Description: Lotus Organizer


reply via email to

[Prev in Thread] Current Thread [Next in Thread]