[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Tail-call elimination in byte-compiled code.
From: |
Stefan Monnier |
Subject: |
Re: [PATCH] Tail-call elimination in byte-compiled code. |
Date: |
Thu, 20 Sep 2012 12:31:22 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.2.50 (gnu/linux) |
> Hi all, and thanks for the great work being put into Emacs!
You're welcome.
> Now that wonderful lexical scope has been added, it is not very
> difficult adding tail-call elimination to byte-compiled code. So I've
> tried to do, just that.
Great!
> ; -*- lexical-binding: t
> (require 'benchmark)
> (defun f (x accum)
> (if (> x 0) (f (1- x) (+ x accum)) accum))
> (defun g (x accum)
> (while (> x 0) (setq accum (+ x accum)
> x (1- x)))
> accum)
> (mapc #'byte-compile (list #'f #'g))
> (benchmark-run-compiled 10 (f 1000000 0))
> (benchmark-run-compiled 10 (g 1000000 0))
> will on my setup even make f some 8% faster than g!
I'd be interested to see the comparison of all 4 numbers:
f-old, g-old, f-new, g-new.
Yes, I know that f-old will fail for lack of stack, so we'd need
a different benchmark, but I think it's important to have such
a measure. Also comparing g-new to g-old is so as to make sure the
patch doesn't end up introducing an overall slowdown.
> Self tail recursive functions has the following little problem:
> (f 1000000 0)
> (let ((y (symbol-function 'f)))
> (fset 'f (lambda (_a _b) -1))
> (funcall y 1000000 1))
> Where the interpreted behaviour give -1, but the byte-compiled wil be
> 500000500001.
> I don't think that is ultimately very serious though.
Its only significant impact is for defadvice. It's hard to tell whether
that would/will be a real problem.
Of course, there are further consequences:
trace-function/debug-on-entry/elp won't see the recursive calls
any more. In some cases this will actually be beneficial.
Another issue is the backtrace printed by `debug', which won't show those
recursive self-tail-calls any more. Again, it's likely to be a virtue
in many cases.
OTOH, while non-self tail-calls won't suffer from the change w.r.t
defadvice (and elp/trace/d-o-e), they will also fail to show up in
debuggers's backtraces and I think this is much more problematic.
Maybe we can fix that be changing byte-tail-call to adjust
backtrace_list (with alloca). That would make those "tail-calls" eat up
some C stack space, which is bad in the even/odd mutual-recursion case,
but I such cases should be a lot less frequent than the "plain
tail-call" case where recursion may not even be present.
We could leave the backtrace_list untouched in the case where the called
function has no name (is not a symbol by a bytecode object), so that
cases such as letrec/cl-labels mutual recursion won't eat up stack space.
> Another problem is that there is a little bit more difference
> in characteristic between interpreted and byte-compiled code,
> as interpreted code will quickly reach max-eval-depth. I don't
> see any easy way out of that now tho, and tail-recursion is a very
> desirable thing for me and likely many others.
Yes, that's a concern. It will prevent use of self-recursive
definitions in some places where they'd be used before they are
compiled. Some of those problems can be solved by adding more
dependencies in lisp/Makefile, but others might be harder (for the files
we need before we byte-compile the byte-compiler).
OTOH we're slowly moving towards a mode where all the code is always
byte-compiled. Example of recent changes in that direction are the
eager macro-expansion, and the byte-compilation of defsubsts before
inlining them (in those cases where we wouldn't know how to inline
their source code).
Further thoughts:
- I'd like self-tail calls to use something closer to `goto'.
I some cases, they can really just use `goto 0' (in those cases where
the memcpy is a nop). Your self-tail-call is really a "discard-under
+ goto-0", so maybe all we need is to generalize it to go to other
labels than 0.
Of course, that would slow down self-tail calls and the only benefit
would be when inlining such functions (and currently such recursive
functions are never inlined). So maybe it's not worth the trouble.
- What happens with the "unbind" bytecodes that would be run after the
`call' when that call is turned into a `(self-)tail-call', e.g. when
the self-tail-call is inside an unwind-protect or a let-bind of
a dynamically-scoped var?
I'm not sure the current handling is correct, yet I'm not sure it's
incorrect either.
- Should we be able to do tail-calls without a new `tail-call'
instruction by just checking whether the next instruction is `return'?
> The patch as is now, will also remove BYTE_CODE_SAFE and
> BYTE_CODE_METER. They made it more difficult for me to understand what
> actually went on in bytecode.c. If someone needs them I will gladly
> add them back.
I think your code and theirs doesn't interact, so you can put it back.
> I did try to put up a heap-allocated byte-code stack so to optimize
> non-tail-recursive inter-byte-code calls avoiding copying the
> arguments on the byte-stack. This unfortunately gave a small but
> significant performance reduction, maybe due to the C-stack
> consistently being in the processor's cache.
That's *very* interesting.
Have you tried allocating it with alloca instead?
> Also I'm not very fond of byte-compile--tail-position variable, and
> would rather add it as an argument to the 'byte-compile handlers.
> I have a patch that does just that along with disbanding
> byte-compile--for-effect and adding that as an argument along.
byte-compile--for-effect is pretty nasty, yes. I'm actually surprised
we don't seem to have too many bugs because of it.
> The only problem is that some backward-compatibility may be broken,
> but I don't know how much external code is really adding their own
> 'byte-compile handlers. Would there be any problems with such a patch?
I think breaking this compatibility would indeed be problematic.
As much as I dislike it, several external packages hook into the
byte-compiler (often for bad reasons, but that's another discussion).
> @@ -2503,15 +2515,16 @@ (defun byte-compile (form)
> (when (symbolp form)
> (unless (memq (car-safe fun) '(closure lambda))
> (error "Don't know how to compile %S" fun))
> - (setq fun (byte-compile--reify-function fun))
> - (setq lexical-binding (eq (car fun) 'closure)))
> + (setq lexical-binding (eq (car fun) 'closure))
> + (setq fun (byte-compile--reify-function fun)))
> (unless (eq (car-safe fun) 'lambda)
> (error "Don't know how to compile %S" fun))
> ;; Expand macros.
> (setq fun (byte-compile-preprocess fun))
Yes, good catch, thank you.
Some more comments and nitpicks below.
Stefan
> (defvar byte-compile--for-effect)
> +(defvar byte-compile--tail-position nil)
> +
Please avoid adding whitespace ;-)
> - (mapc 'byte-compile-output-file-form (cdr form)))
> + (mapc #'byte-compile-output-file-form (cdr form)))
While I'm perfectly OK with such a change (as evidenced by some of my
recent commits), try and avoid including them in a patch that's already
large and meant only for review.
> - (symbol-value this-kind))))
> - )
> + (symbol-value this-kind)))))
[...]
> - (let* ((code (byte-compile-lambda (cons arglist body) t)))
> + (let ((code (byte-compile-lambda (cons arglist body) t nil name)))
Same here.
> @@ -2462,6 +2471,9 @@
> (pcase-let* (((or (and `(lambda ,args . ,body) (let env nil))
> `(closure ,env ,args . ,body)) fun)
> (renv ()))
> + ;; Remove docstring if it exists
> + (when (and (cdr body) (stringp body))
> + (setq body (cdr body)))
> ;; Turn the function's closed vars (if any) into local let bindings.
> (dolist (binding env)
> (cond
The comment lacks punctuation. The change is going in the right
direction, but the docstring shouldn't be just removed.
This said, this change is also irrelevant for tail-calls, AFAICT.
> @@ -2653,7 +2668,7 @@
> ;; closed by now).
> (and lexical-binding
> (byte-compile-make-lambda-lexenv
> fun))
> - reserved-csts)))
> + reserved-csts t)))
You could use "'tail" instead of "t" to make its meaning self-evident.
> + (cond
> + ((and (eq (car form) byte-compile-current-lambda-name)
> + lexical-binding byte-compile--tail-position
> + (let ((sig (byte-compile-arglist-signature
> + byte-compile-current-lambda-arglist))
> + (nargs (length (cdr form))))
> + (and (>= (car sig) nargs)
> + (or (not (cdr sig))
> + (<= nargs (cdr sig))))))
> + (setq form (cdr form))
> + (let* ((rest (memq '&rest byte-compile-current-lambda-arglist))
> + (nnormal-args (- (length byte-compile-current-lambda-arglist)
> + (if rest 2 0)
> + (if (memq '&optional
> + byte-compile-current-lambda-arglist)
> + 1 0)))
> + (nargs 0))
Isn't `rest' the same as (null (cdr sig))?
This nnormal-args computation is a bit ugly here. We should at least move
it into its own function (right next to byte-compile-arglist-signature),
or maybe extend byte-compile-arglist-signature to provide the needed value.
> + ;; FIXME:
> + ;; Pad to make stack match expectations.
> + (byte-compile-constant nil)))
We should probably teach the stack-depth logic that byte-self-tail-call
is a bit like a return.
> + ;; If fun is builtin now, it probably will be later too, so don't
> + ;; tail-eliminate.
> + (if (and (not (subrp fun)) byte-compile--tail-position)
> + 'byte-tail-call 'byte-call)
> + (length (cdr form)))))))
Is byte-tail-call slower than byte-call when invoked on a subroutine?
> + ((eq (car op) 'byte-self-tail-call)
> + (error "Can not inline self-recursive function"))
If we replace byte-self-tail-call with a byte-goto, this problem
will disappear.
> (defun byte-compile-stack-adjustment (op operand)
> "Return the amount by which an operation adjusts the stack.
> OP and OPERAND are as passed to `byte-compile-out'."
> - (if (memq op '(byte-call byte-discardN byte-discardN-preserve-tos))
> + (if (memq op '(byte-call byte-self-tail-call byte-tail-call
> + byte-discardN byte-discardN-preserve-tos))
Here we have a problem since these new byte-codes do not adjust the
stack depth in the same way as byte-call.
> -/* Push x onto the execution stack. This used to be #define PUSH(x)
> - (*++stackp = (x)) This oddity is necessary because Alliant can't be
> - bothered to compile the preincrement operator properly, as of 4/91.
> - -JimB */
> -
> -#define PUSH(x) (top++, *top = (x))
> +#define PUSH(x) (*++top = (x))
Are you sure 21 years is old enough?
> +/* Interpret a numerical args template TEMPLATE, used for lexically-scoped
> + byte-compiled functions. Put the data in MIN_ARGS, MAX_ARGS and REST.
> + Also validates that nargs is in the range. */
> +static void
> +resolve_args_template (ptrdiff_t template, ptrdiff_t *min_args,
> + ptrdiff_t *max_args, bool *rest)
> +{
> + *max_args = template >> 8;
> + *min_args = template & 0x7F;
> + *rest = (template & 0x80) != 0;
> +}
The last sentence in the comment seems out-of-date/place (it applies to the
next function rather than to this one).
> + CASE (Btailcall):
This case is large enough to merit moving to its own function.
> + /* The stack is good to go, now for the rest. */
> + stack.byte_string = AREF (fun, COMPILED_BYTECODE);
> + /* Backward compatibility with emacs pre-20.2
> + inclusively. (See below in exec_byte_code). */
> + if (STRING_MULTIBYTE (stack.byte_string))
> + stack.byte_string = Fstring_as_unibyte (bytestr);
This is a call from a tail-recursive function (i.e. Emacs>24.1) to
a lexically-scoped function (i.e. Emacsā„24.1), so Emacs<21 issues can
be safely ignored.
> + /* Byte compiler should have set all arguments up right. Its
> + just for us to remove the cruft on the stack. */
Please use 2 spaces after a full stop.