[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Tail-call elimination in byte-compiled code.

From: Stefan Monnier
Subject: Re: [PATCH] Tail-call elimination in byte-compiled code.
Date: Thu, 20 Sep 2012 12:31:22 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2.50 (gnu/linux)

> Hi all, and thanks for the great work being put into Emacs!

You're welcome.

> Now that wonderful lexical scope has been added, it is not very
> difficult adding tail-call elimination to byte-compiled code.  So I've
> tried to do, just that.


> ; -*- lexical-binding: t
> (require 'benchmark)
> (defun f (x accum)
>        (if (> x 0) (f (1- x) (+ x accum)) accum))
> (defun g (x accum)
>        (while  (> x 0) (setq accum (+ x accum)
>                             x (1- x)))
>        accum)
> (mapc #'byte-compile (list #'f #'g))
> (benchmark-run-compiled 10 (f 1000000 0))
> (benchmark-run-compiled 10 (g 1000000 0))
> will on my setup even make f some 8% faster than g!

I'd be interested to see the comparison of all 4 numbers:
f-old, g-old, f-new, g-new.

Yes, I know that f-old will fail for lack of stack, so we'd need
a different benchmark, but I think it's important to have such
a measure.  Also comparing g-new to g-old is so as to make sure the
patch doesn't end up introducing an overall slowdown.

> Self tail recursive functions has the following little problem:
> (f 1000000 0)
> (let ((y (symbol-function 'f)))
>      (fset 'f (lambda (_a _b) -1))
>      (funcall y 1000000 1))
> Where the interpreted behaviour give -1, but the byte-compiled wil be
> 500000500001.
> I don't think that is ultimately very serious though.

Its only significant impact is for defadvice.  It's hard to tell whether
that would/will be a real problem.

Of course, there are further consequences:
trace-function/debug-on-entry/elp won't see the recursive calls
any more.  In some cases this will actually be beneficial.

Another issue is the backtrace printed by `debug', which won't show those
recursive self-tail-calls any more.  Again, it's likely to be a virtue
in many cases.

OTOH, while non-self tail-calls won't suffer from the change w.r.t
defadvice (and elp/trace/d-o-e), they will also fail to show up in
debuggers's backtraces and I think this is much more problematic.

Maybe we can fix that be changing byte-tail-call to adjust
backtrace_list (with alloca).  That would make those "tail-calls" eat up
some C stack space, which is bad in the even/odd mutual-recursion case,
but I such cases should be a lot less frequent than the "plain
tail-call" case where recursion may not even be present.
We could leave the backtrace_list untouched in the case where the called
function has no name (is not a symbol by a bytecode object), so that
cases such as letrec/cl-labels mutual recursion won't eat up stack space.

> Another problem is that there is a little bit more difference
> in characteristic between interpreted and byte-compiled code,
> as interpreted code will quickly reach max-eval-depth.  I don't
> see any easy way out of that now tho, and tail-recursion is a very
> desirable thing for me and likely many others.

Yes, that's a concern.  It will prevent use of self-recursive
definitions in some places where they'd be used before they are
compiled.  Some of those problems can be solved by adding more
dependencies in lisp/Makefile, but others might be harder (for the files
we need before we byte-compile the byte-compiler).

OTOH we're slowly moving towards a mode where all the code is always
byte-compiled.  Example of recent changes in that direction are the
eager macro-expansion, and the byte-compilation of defsubsts before
inlining them (in those cases where we wouldn't know how to inline
their source code).

Further thoughts:
- I'd like self-tail calls to use something closer to `goto'.
  I some cases, they can really just use `goto 0' (in those cases where
  the memcpy is a nop).  Your self-tail-call is really a "discard-under
  + goto-0", so maybe all we need is to generalize it to go to other
  labels than 0.
  Of course, that would slow down self-tail calls and the only benefit
  would be when inlining such functions (and currently such recursive
  functions are never inlined).  So maybe it's not worth the trouble.
- What happens with the "unbind" bytecodes that would be run after the
  `call' when that call is turned into a `(self-)tail-call', e.g. when
  the self-tail-call is inside an unwind-protect or a let-bind of
  a dynamically-scoped var?
  I'm not sure the current handling is correct, yet I'm not sure it's
  incorrect either.
- Should we be able to do tail-calls without a new `tail-call'
  instruction by just checking whether the next instruction is `return'?
> The patch as is now, will also remove BYTE_CODE_SAFE and
> BYTE_CODE_METER. They made it more difficult for me to understand what
> actually went on in bytecode.c. If someone needs them I will gladly
> add them back.

I think your code and theirs doesn't interact, so you can put it back.

> I did try to put up a heap-allocated byte-code stack so to optimize
> non-tail-recursive inter-byte-code calls avoiding copying the
> arguments on the byte-stack. This unfortunately gave a small but
> significant performance reduction, maybe due to the C-stack
> consistently being in the processor's cache.

That's *very* interesting.
Have you tried allocating it with alloca instead?

> Also I'm not very fond of byte-compile--tail-position variable, and
> would rather add it as an argument to the 'byte-compile handlers.
> I have a patch that does just that along with disbanding
> byte-compile--for-effect and adding that as an argument along.

byte-compile--for-effect is pretty nasty, yes.  I'm actually surprised
we don't seem to have too many bugs because of it.

> The only problem is that some backward-compatibility may be broken,
> but I don't know how much external code is really adding their own
> 'byte-compile handlers.  Would there be any problems with such a patch?

I think breaking this compatibility would indeed be problematic.
As much as I dislike it, several external packages hook into the
byte-compiler (often for bad reasons, but that's another discussion).

> @@ -2503,15 +2515,16 @@ (defun byte-compile (form)
>          (when (symbolp form)
>            (unless (memq (car-safe fun) '(closure lambda))
>              (error "Don't know how to compile %S" fun))
> -          (setq fun (byte-compile--reify-function fun))
> -          (setq lexical-binding (eq (car fun) 'closure)))
> +          (setq lexical-binding (eq (car fun) 'closure))
> +          (setq fun (byte-compile--reify-function fun)))
>          (unless (eq (car-safe fun) 'lambda)
>            (error "Don't know how to compile %S" fun))
>          ;; Expand macros.
>          (setq fun (byte-compile-preprocess fun))

Yes, good catch, thank you.
Some more comments and nitpicks below.


>  (defvar byte-compile--for-effect)
> +(defvar byte-compile--tail-position nil)
> +

Please avoid adding whitespace ;-)

> -            (mapc 'byte-compile-output-file-form (cdr form)))
> +            (mapc #'byte-compile-output-file-form (cdr form)))

While I'm perfectly OK with such a change (as evidenced by some of my
recent commits), try and avoid including them in a patch that's already
large and meant only for review.

> -                      (symbol-value this-kind))))
> -          )
> +                      (symbol-value this-kind)))))
> -      (let* ((code (byte-compile-lambda (cons arglist body) t)))
> +      (let ((code (byte-compile-lambda (cons arglist body) t nil name)))

Same here.

> @@ -2462,6 +2471,9 @@
>    (pcase-let* (((or (and `(lambda ,args . ,body) (let env nil))
>                      `(closure ,env ,args . ,body)) fun)
>                 (renv ()))
> +    ;; Remove docstring if it exists
> +    (when (and (cdr body) (stringp body)) 
> +      (setq body (cdr body)))
>      ;; Turn the function's closed vars (if any) into local let bindings.
>      (dolist (binding env)
>        (cond

The comment lacks punctuation.  The change is going in the right
direction, but the docstring shouldn't be just removed.
This said, this change is also irrelevant for tail-calls, AFAICT.

> @@ -2653,7 +2668,7 @@
>                                     ;; closed by now).
>                                     (and lexical-binding
>                                          (byte-compile-make-lambda-lexenv 
> fun))
> -                                   reserved-csts)))
> +                                   reserved-csts t)))

You could use "'tail" instead of "t" to make its meaning self-evident.

> +  (cond
> +   ((and (eq (car form) byte-compile-current-lambda-name)
> +         lexical-binding byte-compile--tail-position
> +         (let ((sig (byte-compile-arglist-signature 
> +                     byte-compile-current-lambda-arglist))
> +               (nargs (length (cdr form))))
> +           (and (>= (car sig) nargs)
> +                (or (not (cdr sig)) 
> +                    (<= nargs (cdr sig))))))
> +    (setq form (cdr form))
> +    (let* ((rest (memq '&rest byte-compile-current-lambda-arglist))
> +           (nnormal-args (- (length byte-compile-current-lambda-arglist)
> +                            (if rest 2 0)
> +                            (if (memq '&optional 
> +                                      byte-compile-current-lambda-arglist)
> +                                1 0)))
> +           (nargs 0))

Isn't `rest' the same as (null (cdr sig))?

This nnormal-args computation is a bit ugly here.  We should at least move
it into its own function (right next to byte-compile-arglist-signature),
or maybe extend byte-compile-arglist-signature to provide the needed value.

> +      ;; FIXME: 
> +      ;; Pad to make stack match expectations.
> +      (byte-compile-constant nil)))

We should probably teach the stack-depth logic that byte-self-tail-call
is a bit like a return.

> +       ;; If fun is builtin now, it probably will be later too, so don't 
> +       ;; tail-eliminate.
> +       (if (and (not (subrp fun)) byte-compile--tail-position)
> +           'byte-tail-call 'byte-call)
> +       (length (cdr form)))))))

Is byte-tail-call slower than byte-call when invoked on a subroutine?
> +       ((eq (car op) 'byte-self-tail-call)
> +        (error "Can not inline self-recursive function"))

If we replace byte-self-tail-call with a byte-goto, this problem
will disappear.

>  (defun byte-compile-stack-adjustment (op operand)
>    "Return the amount by which an operation adjusts the stack.
>  OP and OPERAND are as passed to `byte-compile-out'."
> -  (if (memq op '(byte-call byte-discardN byte-discardN-preserve-tos))
> +  (if (memq op '(byte-call byte-self-tail-call byte-tail-call
> +                           byte-discardN byte-discardN-preserve-tos))

Here we have a problem since these new byte-codes do not adjust the
stack depth in the same way as byte-call.

> -/* Push x onto the execution stack.  This used to be #define PUSH(x)
> -   (*++stackp = (x)) This oddity is necessary because Alliant can't be
> -   bothered to compile the preincrement operator properly, as of 4/91.
> -   -JimB */
> -
> -#define PUSH(x) (top++, *top = (x))
> +#define PUSH(x) (*++top = (x))

Are you sure 21 years is old enough?

> +/* Interpret a numerical args template TEMPLATE, used for lexically-scoped 
> +   byte-compiled functions.  Put the data in MIN_ARGS, MAX_ARGS and REST. 
> +   Also validates that nargs is in the range.  */
> +static void
> +resolve_args_template (ptrdiff_t template, ptrdiff_t *min_args, 
> +                       ptrdiff_t *max_args, bool *rest)
> +{
> +  *max_args = template >> 8;
> +  *min_args = template & 0x7F;
> +  *rest = (template & 0x80) != 0;
> +}

The last sentence in the comment seems out-of-date/place (it applies to the
next function rather than to this one).

> +        CASE (Btailcall):

This case is large enough to merit moving to its own function.

> +                /* The stack is good to go, now for the rest.  */
> +                stack.byte_string = AREF (fun, COMPILED_BYTECODE);
> +                /* Backward compatibility with emacs pre-20.2
> +                   inclusively.  (See below in exec_byte_code).  */
> +                if (STRING_MULTIBYTE (stack.byte_string))
> +                  stack.byte_string = Fstring_as_unibyte (bytestr);

This is a call from a tail-recursive function (i.e. Emacs>24.1) to
a lexically-scoped function (i.e. Emacsā‰„24.1), so Emacs<21 issues can
be safely ignored.

> +          /* Byte compiler should have set all arguments up right. Its
> +             just for us to remove the cruft on the stack.  */

Please use 2 spaces after a full stop.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]