avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] RFC: Speeding up small ISRs: PR20296


From: Erik Christiansen
Subject: Re: [avr-gcc-list] RFC: Speeding up small ISRs: PR20296
Date: Sat, 17 Jun 2017 21:51:16 +1000
User-agent: Mutt/1.8.0 (2017-02-23)

Reply-To: address@hidden

On 15.06.17 14:43, Georg-Johann Lay wrote:
> https://gcc.gnu.org/PR20296
> 
> is about speeding up "small" ISRs, and is open for 12 years now...
> 
> Anyone familiar with avr-gcc knows that a fix would be high effort and risk,
> and that's the major reason for why PR20296 is still open (and even
> classified "suspended").
> 
> In some forum discussion (again!) on that issue, there was the following
> proposal to approach that PR:

Reading the PR causes me to infer that moving that code generation out
of gcc into gas is the proposed fix for unmanageable optimisation
complexity in gcc in the use case.

> 1) Let GCC emit directives / pseudo-instructions in non-naked ISR prologue /
> epilogue

If only existing directives and macro invocations are emitted, then the
need to modify gas code is obviated. I.e. if:

  .maybe_isr_prologue 123       ; were instead:
   maybe_isr_prologue 123

then a gas macro could generate the required code without unnecessary
complexity. If the desired code to be generated from the parameters
supplied can be described, then I'll write the macro(s). After some
iterations, we should have some good results for some useful use cases.

> 2) Let GAS scan the code and replace the directives with code as needed.

That is what gas does with directives and macros, but avoiding
modification of gas to add new directives is a very worthwhile design
goal, not least to avoid being told on binutils to do simple things with
the directives already provided. (And code generation is simple, if a
macro invocation with parameter(s) is supplied.)

> Currently,
> 
> #include <avr/io.h>
> #include <avr/interrupt.h>
> 
> ISR (INT0_vect)
> {
>     __asm ("; Code");
> }
> 
> emit something like:
> 
> 
> __vector_1:
>       push r1
>       push r0
>       in r0,__SREG__
>       push r0
>       clr __zero_reg__
> .L__stack_usage = 3
> 
>       ; Code
> 
>       pop r0
>       out __SREG__,r0
>       pop r0
>       pop r1
>       reti
> 
> 
> which would change to:
> 
> 
> __vector_1:
>       .maybe_isr_prologue 123
>       ;; Rest of prologue
> 
>       ; Code
> 
>       ;; Rest of epilogue
>       .maybe_isr_epilogue 123
>       reti
> 
> GAS would then scan the code associated to the function and replace the
> .maybe by appropriate sequence to safe / init / restore tmp-reg, zero-reg
> and SREG.

That sets things up handily for finishing by simple macro. Let us say
that gcc emits "_maybe_isr_prologue 1 2 3", then 1 could be the switch
for save, 2 for init, and 3 for restore, if desired. Gas macros readily
handle omission of the last parameter (with it then taking an internally
defined default value), which can be useful if gas knows the default,
and gcc doesn't. Lumping it all into a single parameter would lead to 8
parameter values, just to cover 3 binary switches, IIUC the use case.

> Other registers like R24 are handled by GCC as usual.  For example, if
> the scan reveals that tmp-reg is not needed but zero-reg is (which
> will imply SREG due to the CLR) the replacement code would be:

> 
> __vector_1:
>       push r1
>       in r1,__SREG__
>       push r1
>       clr __zero_reg__
> 
>       ; Code
> 
>       pop r1
>       out __SREG__,r1
>       pop r1
>       reti

An epilogue macro can be made to know whether its matching prologue
saved tmp-reg, even if that is stretching assembler macros slightly.
That would not require any additional code scan. So long as nesting ISRs
is illegal, then it would not even clutter the gas symbol table
perceptibly.

> Maybe someone is interested in implementing the GAS part, and if that is the
> case and the proposal is feasible, I would take care of the GCC part.

I propose that we minimise toolchain modification by choosing an elegant
implementation, based on existing gas capabilities, if feasible. Thus
far, I have not seen any proposed code generation which ought not be
achievable that way.

> Caveats:
> 
> a) .L__stack_usage can no more be computed by GCC

It is no effort for gas to implement lines like: .L__stack_usage = 3
As we know exactly how many bytes we are adding to the stack frame, we
can effortlessly dimension and emit that line - and yes, it is gas which
converts that 'L' into a unique integer. Whatever code later uses all the
.nnnn__stack_usage sizes should continue to work as before.

> b) It's hard to find the end of the relevant code.  We might have
> interleaved sections (like with dispatch tables), there might be code that
> is emit after the epilogue, there might be more than 1 epilogue,

If gcc doesn't know what it is doing, then gas can't fix that part. ;-)
If gcc emits 15 "_maybe_maybe" macros, then gas will make 15 expansions.
If gcc does know what it is doing, then perhaps 14 of them will expand
to no code, or part of a prologue/epilogue, in a useful sequence. Gas
will not know what gcc was up to when it executed, in the recent past,
except by what it passes in its output.

> dunno if GAS can infer whether JMP is local or non-local.

If it's one of its recognised local labels. Macro-local symbols may be
defined by a "LOCAL" directive, and local scope symbols by use of '@' or
concatenating a parameter suffix to part-label. There are also the
numbered local labels. It knows and uses local symbols beginning with
'L', and omits them from the symbol table, IIRC.

If the JMP destination is external, then ld will handle the linking.
That's well outside the remit of gas. If any relaxation is hoped for,
then that will be also provided by ld, if available.

> We could add a new GCC pass that filters out situations that are pointless
> to scan like code with dispatch tables or function calls, and fall back to
> classical prologue / epilogue in such cases.

Is this alluding to some sort of demultiplexer in an ISR?
It seems odd that such specificity would be suited to general treatment
by the toolchain. It sounds intriguing.

> The .maybe gets function-unique identifiers (123 in the example) so that GAS
> knows which epilogue belongs to which .prologue provided that's helpful.

That greatly simplifies the work to be done in gas. Now linking prologue
with epilogue is effortless. But as an ISR will be contained in one
compile unit, and no other ISR will sanely be nested, is that required?
(I may have some iuse-case catching up to do here.)

> I am not familiar with Binutils / GAS though and don't know if it's easy to
> add the 2 new passes: One to scan and one to replace the .maybe with
> appropriate code.

Neither appear necessary. Gas makes the substitutions as is.

> IIUC GAS only works on sections,

Sorry, no, it only works on compile units, i.e a source file and its
includes. It knows little else. Its relationship to sections is that
understands .section directives to the extent that it will put code into
whichever section is named in one. It even has a section stack, so that
it can pop back to the prior section after an excursion into another.

> and the scan would be on BFD internal representation (like relaxing)
> after the parser read in the asm sources?

Gas takes assembler source code text as input, and generates an ELF
relocatable object file. It is only ld which can perform relaxation.

> The GCC change would add a new option, configure test whether GAS supports
> this, let ISR prologue and epilogue emit new unspec_volatile pseudo insns

Hmmm, "unspec_volatile" doesn't appear in the described use case. If it
is desired to communicate some state or value to gas, it is only
necessary to issue:

   _maybe_isr_prologue 123 _here_be_elephants 42

and the _maybe_isr_prologue macro can:

   .set have_elephants_123 1
   .set elephant_count_123 42

and make decisions based on that for the remainder of the compile unit.
The only caveat is that here_be_elephants will have to have a defined
value, most easily provided in an always-included header, perhaps.

> and add a scan pass to detect situations that are pointless should fall back
> to old code, like when dispatch tables, calls or non-local goto is seen.

The examples provided are within existing gas capability. Let's go with
that to begin with. Complexity is like Murphy, and arrives soon enough,
without being invited.

Erik




reply via email to

[Prev in Thread] Current Thread [Next in Thread]