[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Tail-call elimination in byte-compiled code.
Re: [PATCH] Tail-call elimination in byte-compiled code.
Thu, 20 Sep 2012 12:31:22 -0400
Gnus/5.13 (Gnus v5.13) Emacs/24.2.50 (gnu/linux)
> Hi all, and thanks for the great work being put into Emacs!
> Now that wonderful lexical scope has been added, it is not very
> difficult adding tail-call elimination to byte-compiled code. So I've
> tried to do, just that.
> ; -*- lexical-binding: t
> (require 'benchmark)
> (defun f (x accum)
> (if (> x 0) (f (1- x) (+ x accum)) accum))
> (defun g (x accum)
> (while (> x 0) (setq accum (+ x accum)
> x (1- x)))
> (mapc #'byte-compile (list #'f #'g))
> (benchmark-run-compiled 10 (f 1000000 0))
> (benchmark-run-compiled 10 (g 1000000 0))
> will on my setup even make f some 8% faster than g!
I'd be interested to see the comparison of all 4 numbers:
f-old, g-old, f-new, g-new.
Yes, I know that f-old will fail for lack of stack, so we'd need
a different benchmark, but I think it's important to have such
a measure. Also comparing g-new to g-old is so as to make sure the
patch doesn't end up introducing an overall slowdown.
> Self tail recursive functions has the following little problem:
> (f 1000000 0)
> (let ((y (symbol-function 'f)))
> (fset 'f (lambda (_a _b) -1))
> (funcall y 1000000 1))
> Where the interpreted behaviour give -1, but the byte-compiled wil be
> I don't think that is ultimately very serious though.
Its only significant impact is for defadvice. It's hard to tell whether
that would/will be a real problem.
Of course, there are further consequences:
trace-function/debug-on-entry/elp won't see the recursive calls
any more. In some cases this will actually be beneficial.
Another issue is the backtrace printed by `debug', which won't show those
recursive self-tail-calls any more. Again, it's likely to be a virtue
in many cases.
OTOH, while non-self tail-calls won't suffer from the change w.r.t
defadvice (and elp/trace/d-o-e), they will also fail to show up in
debuggers's backtraces and I think this is much more problematic.
Maybe we can fix that be changing byte-tail-call to adjust
backtrace_list (with alloca). That would make those "tail-calls" eat up
some C stack space, which is bad in the even/odd mutual-recursion case,
but I such cases should be a lot less frequent than the "plain
tail-call" case where recursion may not even be present.
We could leave the backtrace_list untouched in the case where the called
function has no name (is not a symbol by a bytecode object), so that
cases such as letrec/cl-labels mutual recursion won't eat up stack space.
> Another problem is that there is a little bit more difference
> in characteristic between interpreted and byte-compiled code,
> as interpreted code will quickly reach max-eval-depth. I don't
> see any easy way out of that now tho, and tail-recursion is a very
> desirable thing for me and likely many others.
Yes, that's a concern. It will prevent use of self-recursive
definitions in some places where they'd be used before they are
compiled. Some of those problems can be solved by adding more
dependencies in lisp/Makefile, but others might be harder (for the files
we need before we byte-compile the byte-compiler).
OTOH we're slowly moving towards a mode where all the code is always
byte-compiled. Example of recent changes in that direction are the
eager macro-expansion, and the byte-compilation of defsubsts before
inlining them (in those cases where we wouldn't know how to inline
their source code).
- I'd like self-tail calls to use something closer to `goto'.
I some cases, they can really just use `goto 0' (in those cases where
the memcpy is a nop). Your self-tail-call is really a "discard-under
+ goto-0", so maybe all we need is to generalize it to go to other
labels than 0.
Of course, that would slow down self-tail calls and the only benefit
would be when inlining such functions (and currently such recursive
functions are never inlined). So maybe it's not worth the trouble.
- What happens with the "unbind" bytecodes that would be run after the
`call' when that call is turned into a `(self-)tail-call', e.g. when
the self-tail-call is inside an unwind-protect or a let-bind of
a dynamically-scoped var?
I'm not sure the current handling is correct, yet I'm not sure it's
- Should we be able to do tail-calls without a new `tail-call'
instruction by just checking whether the next instruction is `return'?
> The patch as is now, will also remove BYTE_CODE_SAFE and
> BYTE_CODE_METER. They made it more difficult for me to understand what
> actually went on in bytecode.c. If someone needs them I will gladly
> add them back.
I think your code and theirs doesn't interact, so you can put it back.
> I did try to put up a heap-allocated byte-code stack so to optimize
> non-tail-recursive inter-byte-code calls avoiding copying the
> arguments on the byte-stack. This unfortunately gave a small but
> significant performance reduction, maybe due to the C-stack
> consistently being in the processor's cache.
That's *very* interesting.
Have you tried allocating it with alloca instead?
> Also I'm not very fond of byte-compile--tail-position variable, and
> would rather add it as an argument to the 'byte-compile handlers.
> I have a patch that does just that along with disbanding
> byte-compile--for-effect and adding that as an argument along.
byte-compile--for-effect is pretty nasty, yes. I'm actually surprised
we don't seem to have too many bugs because of it.
> The only problem is that some backward-compatibility may be broken,
> but I don't know how much external code is really adding their own
> 'byte-compile handlers. Would there be any problems with such a patch?
I think breaking this compatibility would indeed be problematic.
As much as I dislike it, several external packages hook into the
byte-compiler (often for bad reasons, but that's another discussion).
> @@ -2503,15 +2515,16 @@ (defun byte-compile (form)
> (when (symbolp form)
> (unless (memq (car-safe fun) '(closure lambda))
> (error "Don't know how to compile %S" fun))
> - (setq fun (byte-compile--reify-function fun))
> - (setq lexical-binding (eq (car fun) 'closure)))
> + (setq lexical-binding (eq (car fun) 'closure))
> + (setq fun (byte-compile--reify-function fun)))
> (unless (eq (car-safe fun) 'lambda)
> (error "Don't know how to compile %S" fun))
> ;; Expand macros.
> (setq fun (byte-compile-preprocess fun))
Yes, good catch, thank you.
Some more comments and nitpicks below.
> (defvar byte-compile--for-effect)
> +(defvar byte-compile--tail-position nil)
Please avoid adding whitespace ;-)
> - (mapc 'byte-compile-output-file-form (cdr form)))
> + (mapc #'byte-compile-output-file-form (cdr form)))
While I'm perfectly OK with such a change (as evidenced by some of my
recent commits), try and avoid including them in a patch that's already
large and meant only for review.
> - (symbol-value this-kind))))
> - )
> + (symbol-value this-kind)))))
> - (let* ((code (byte-compile-lambda (cons arglist body) t)))
> + (let ((code (byte-compile-lambda (cons arglist body) t nil name)))
> @@ -2462,6 +2471,9 @@
> (pcase-let* (((or (and `(lambda ,args . ,body) (let env nil))
> `(closure ,env ,args . ,body)) fun)
> (renv ()))
> + ;; Remove docstring if it exists
> + (when (and (cdr body) (stringp body))
> + (setq body (cdr body)))
> ;; Turn the function's closed vars (if any) into local let bindings.
> (dolist (binding env)
The comment lacks punctuation. The change is going in the right
direction, but the docstring shouldn't be just removed.
This said, this change is also irrelevant for tail-calls, AFAICT.
> @@ -2653,7 +2668,7 @@
> ;; closed by now).
> (and lexical-binding
> - reserved-csts)))
> + reserved-csts t)))
You could use "'tail" instead of "t" to make its meaning self-evident.
> + (cond
> + ((and (eq (car form) byte-compile-current-lambda-name)
> + lexical-binding byte-compile--tail-position
> + (let ((sig (byte-compile-arglist-signature
> + byte-compile-current-lambda-arglist))
> + (nargs (length (cdr form))))
> + (and (>= (car sig) nargs)
> + (or (not (cdr sig))
> + (<= nargs (cdr sig))))))
> + (setq form (cdr form))
> + (let* ((rest (memq '&rest byte-compile-current-lambda-arglist))
> + (nnormal-args (- (length byte-compile-current-lambda-arglist)
> + (if rest 2 0)
> + (if (memq '&optional
> + byte-compile-current-lambda-arglist)
> + 1 0)))
> + (nargs 0))
Isn't `rest' the same as (null (cdr sig))?
This nnormal-args computation is a bit ugly here. We should at least move
it into its own function (right next to byte-compile-arglist-signature),
or maybe extend byte-compile-arglist-signature to provide the needed value.
> + ;; FIXME:
> + ;; Pad to make stack match expectations.
> + (byte-compile-constant nil)))
We should probably teach the stack-depth logic that byte-self-tail-call
is a bit like a return.
> + ;; If fun is builtin now, it probably will be later too, so don't
> + ;; tail-eliminate.
> + (if (and (not (subrp fun)) byte-compile--tail-position)
> + 'byte-tail-call 'byte-call)
> + (length (cdr form)))))))
Is byte-tail-call slower than byte-call when invoked on a subroutine?
> + ((eq (car op) 'byte-self-tail-call)
> + (error "Can not inline self-recursive function"))
If we replace byte-self-tail-call with a byte-goto, this problem
> (defun byte-compile-stack-adjustment (op operand)
> "Return the amount by which an operation adjusts the stack.
> OP and OPERAND are as passed to `byte-compile-out'."
> - (if (memq op '(byte-call byte-discardN byte-discardN-preserve-tos))
> + (if (memq op '(byte-call byte-self-tail-call byte-tail-call
> + byte-discardN byte-discardN-preserve-tos))
Here we have a problem since these new byte-codes do not adjust the
stack depth in the same way as byte-call.
> -/* Push x onto the execution stack. This used to be #define PUSH(x)
> - (*++stackp = (x)) This oddity is necessary because Alliant can't be
> - bothered to compile the preincrement operator properly, as of 4/91.
> - -JimB */
> -#define PUSH(x) (top++, *top = (x))
> +#define PUSH(x) (*++top = (x))
Are you sure 21 years is old enough?
> +/* Interpret a numerical args template TEMPLATE, used for lexically-scoped
> + byte-compiled functions. Put the data in MIN_ARGS, MAX_ARGS and REST.
> + Also validates that nargs is in the range. */
> +static void
> +resolve_args_template (ptrdiff_t template, ptrdiff_t *min_args,
> + ptrdiff_t *max_args, bool *rest)
> + *max_args = template >> 8;
> + *min_args = template & 0x7F;
> + *rest = (template & 0x80) != 0;
The last sentence in the comment seems out-of-date/place (it applies to the
next function rather than to this one).
> + CASE (Btailcall):
This case is large enough to merit moving to its own function.
> + /* The stack is good to go, now for the rest. */
> + stack.byte_string = AREF (fun, COMPILED_BYTECODE);
> + /* Backward compatibility with emacs pre-20.2
> + inclusively. (See below in exec_byte_code). */
> + if (STRING_MULTIBYTE (stack.byte_string))
> + stack.byte_string = Fstring_as_unibyte (bytestr);
This is a call from a tail-recursive function (i.e. Emacs>24.1) to
a lexically-scoped function (i.e. Emacs≥24.1), so Emacs<21 issues can
be safely ignored.
> + /* Byte compiler should have set all arguments up right. Its
> + just for us to remove the cruft on the stack. */
Please use 2 spaces after a full stop.