emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH, RFC] Macros expansion changes, robust symbol macros, and con


From: Daniel Colascione
Subject: Re: [PATCH, RFC] Macros expansion changes, robust symbol macros, and constant propagation
Date: Tue, 17 Sep 2013 22:34:37 -0700
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:17.0) Gecko/20130801 Thunderbird/17.0.8

On 9/17/13 8:06 PM, Stefan Monnier wrote:
>>>> - removes Fmacroexpand and implements macroexpand-1 and macroexpand in lisp
>>> Sounds good (indeed for gv.el we want macroexpand-1).  Do you have some
>>> measurement of the performance impact?
>> On my machine, macroexpand is about 30% slower than the old version,
>> depending on environment size.
> 
> I think a good measurement is "load a whole bunch of *.el files", which
> will spend time in eager macro-expansion (as well as a bit of `read' and
> a bit of `eval', plus a bit of coding-system decode which could be
> optimized away, ideally, but isn't yet).  30% slowdown for such
> a test-case would be definitely considered "too costly".

Let's see how bad it is in the real world. I'm confident I can make the
thing perform as well as the old version in the common case. Worse comes
to worst, we can keep the C macroexpand implementation and make
macroexpand-1 a separate code path.

Don't forget that constant propagation might provide significant
offsetting speedups.

>>> BTW, I see you sometimes strip the macroexpand-already-expanded tag.
>>> Can't this lead again to repeated expansion?
>> Yes, but only in a harmless way.  See above.
> 
> I still don't understand what guarantees that it's always harmless.

I meant "harmless" in the sense that a macroexpand on a symbol-macrolet
form always produces a fully macroexpanded form, and that this expansion
incorporates all the expansions of its children. Repeatedly
macroexpanding this form is safe. What isn't safe is macroexpand-alling
the result of calling macroexpand-all in a non-null symbol macro
environment. In this case, yes, you can produce an incorrect result,
just as the default macro expansion strategy does.

But you'd have to do that on purpose: a macro is either going to rely on
the default macro expansion strategy, which works fine, or it's going to
return something like (cons :dontexpand (macroexpand-all (some-transform
macro-arguments) (append something macro-environment)), which is also
safe. You'd have to try in order to make repeated expansion a problem.
What would be the point of having macroexpand-all return a form with the
:dontexpand tag still attached when its caller is either going to be
another macro, which will add its own tag, or a top-level caller, which
doesn't want to see the tag?

>>>> - adds symbol-macros to the core, in macroexp
>>> I don't think I like this: CL's symbol-macro is *very* rarely used, so
>>> for now I'd rather keep it in cl-macs.el.
>> Ah, but we use symbol macros to do constant propagation, and we want
>> constant propagation to be in the core!
> 
> I'm not convinced symbol-macro is a good way to perform
> constant propagation.  See my comment below.

Why not? symbol-macrolet does precisely what we need. Would you be
happier if we added the symbol-macrolet's rewrite engine and just didn't
call it symbol-macrolet? Why write another tree walker when macro
expansion serves?

>> We need this analysis for optimizing let bindings.  cconv's analysis is
>> similar in some respects, and I looked into using it, but cconv's
>> analysis does both more (closure candidates) and less (reference
>> analysis) than what we need
> 
> "More" is not a problem, I think.  As for "less", it might be easy to
> address it.

There's another problem: cconv analysis happens _once_, in
byte-compile-preprocess.  We need to run constant propagation
repeatedly. Each round of constant propagation might enable further
optimizations that, in turn, can cause more constants bindings to
appear. Constant propagation is fine as either a byte-optimizer or
compiler-macro, but cconv is the wrong place for it.

Think about what happens when a defsubst is inserted into a function
using a let-binding for its arguments.

The point of this project is to generate better bytecode by letting
Emacs make more decisions at compile time. I'm willing to pay for
runtime speedup with an increase in compilation time.

>> --- besides, it has a very inconvenient interface.
> 
> That is true.  But there are fairly good performance reasons for it.

Sure --- but we're talking about running constant propagation only
during byte compilation.  cconv has to run all the time.

>>> It's only used for cl--expr-contains, so it should be in cl-macs.el, not
>>> in macroexp.el.  And I'm really not sure this complexity is warranted
>>> for those uses: does it fix actual bugs?
>> We'll be using it for other tasks too.  Also, any unsafe code
>> transformation is a bug.
> 
> Yes, but I'm not convinced it's really unsafe.  It is sometimes less
> aggressive than it could, but that's not a problem.

It depends on the application. The function's definition doesn't say
anything about it being a conservative analysis.

>> If we have a robust analysis engine, why not use it everywhere?
> 
> Because that can be very costly.

But if it isn't, there's no problem. Let's see what the performance
impact actually is before deciding that getting a precise answer is too
expensive.

>>>> - changes cl-defsubst so that it works like regular defsubst
>>> Doesn't this break setf on defstruct slots?
>> Yes, but I fixed it by restoring the old explicit-accessor code.  Having
>> gv depend on compiler macros is scary.
> 
> But it's efficient.  A single symbol property provides both features at
> the same time.

Symbol properties aren't particularly expensive.

>> A compiler macro should be pure optimization.  Code shouldn't depend
>> on compiler macros for correctness.
> Emacs is not Common-Lisp.

"Common Lisp does X, so we should too" is a good argument, and you can't
dismiss it by just saying that Emacs is not Common Lisp, which is true,
but misleading: Common Lisp is elisp's closest relative. A lot of
thought went into Common Lisp's design, and the language has proven
workable for a long time.

> And it's not like the code will misbehave: the compilation will simply
> fail if the compiler-macro is absent, so it's still semantically
> pretty clean.
> 
>>>> +(defun byte-optimize-do-constant-propagation (let-form)
> [...]
>>>> +           (macroexpand-all
>>>> +            `(symbol-macrolet-shadowable
>>>> +                 ,(nreverse const-bindings)
>>>> +               ,@body)))))))
>>> How does this handle the case where the var is first bound to a constant
>>> and later setq'd?
>> Oops. That code didn't make it into the patch. The intent is to use
>> macroexp-analyze-free-variables to make mutated variables ineligible for
>> this optimization.
> 
> Yuck!  The above is already pretty bad, and adding
> macroexp-analyze-free-variables will make it even worse: optimizing
> a let according to the above code takes time O(size(exp)).  Plus another
> O(size(exp)) for macroexp-analyze-free-variables (actually, this one can
> be even O(size(exp)^2) or so because at each step it can manipulate
> data-structures that can grow with size(exp)).  And this for every
> nesting level, so you get a serious O(N^2) behavior overall, sliding
> towards O(N^3), just to optimize away some const bindings.

Yes, we need to walk each form to figure out which bindings we can
safely eliminate, and yes, we need to walk the code again to do the
replacements. We need to do that for each let form. But your analysis is
too conservative: the data structure manipulation is going to be cheap
in practice because most forms aren't made chiefly out of free variable
references. It might also be possible to memoize the results so that
total running time becomes linear.

Time isn't a problem in practice.  Even the naively-implemented,
un-optimized version of macroexp-analyze-free-variables  processes the
macro-expanded version of cconv-analyze-form (a very large function) in
1.6ms. This running time is perfectly acceptable, especially if we
perform constant propagation bottom-up and re-use the intermediate
results. Even if we naively re-ran the analysis for each of the 15 let
bindings in this function that require analysis because we can't
disqualify constant substitution using local knowledge, we're only
talking about adding 24ms to the compilation time.

For contrast, byte-compiling cconv-analyze-form on a stock recentish
Emacs takes over 100ms, and that's with the GC disabled!

> Not good.

The point of compilation is to spend some CPU time now so that you spend
less CPU time later. Why does compilation need to be fast?

> The "immutable" info is already computed by cconv, so you could instead
> change cconv to output code where the immutable let bindings
> are annotated.

And rerun cconv after each optimization pass?

>>>> (push (list temp) cl--loop-bindings)
>>>> (setq form `(if (setq ,temp ,cond)
>>>> -                                ,@(cl-subst temp 'it form))))
>>>> +                                (symbol-macrolet-shadowable ((it ,temp))
>>>> +                                  ,@form))))
>>> Why not (setq form `(let ((it ,cond)) (when it ,@form)))?
>> In dynamically bound code, we want the `it' symbol to be lexically
>> scoped.
> 
> I don't think anyone really cares about that.

I don't understand why you're opposed to rewriting `it' here.  It's not
as if the `let' solution is much smaller.

>> I suppose it doesn't matter all that much, but if we have
>> robust symbol-macro machinery, why not use it?
> 
> Because the (let ((it ,cond)) (when it ,@form)) is simpler and cleaner?

I don't think it's either simpler or cleaner. Using symbol-macrolet is
about as simple as using let, doesn't pollute the dynamic environment,
and doesn't produce spurious warnings.

> Also it will be more efficient for the lexical-binding code (as
> discussed recently, in lexical-binding, `let' is cheaper than `setq').

Who said anything about setq?

> OK, yes, it sucks in one respect: you'll get spurious warnings about
> `it' being unused.  But that's a general problem we need to solve in
> any case.
> 
>>>> +(defalias 'cl-letf 'letf)
>>>> +(defalias 'cl-letf* 'letf*)
>>> I don't think I want to have letf in the core
> 
>> I really don't like letf.  That said, we need it in the core if we have
>> symbol macros in the core and we want to preserve existing
>> symbol-macrolet behavior.
> 
> That's OK, since I still don't want symbol-macro in the core.

I still don't understand why you're opposed to adding symbol-macrolet.

>>> (but I'd like to change cl-letf so that in the case of
>>> simple-variable bindings, those should be dynamically scoped, even in
>>> lexical-binding code).
>> I'm not sure this change makes sense.  It'd break existing code and
>> break optimizations that depend on letf boiling down to regular let in
>> simple cases.
> 
> Not sure what optimizations you're thinking of.  And I'm not talking
> about not using `let', I'm just talking about marking those `let' as
> dynamically-scoped.

I think special variable bindings need to be done with special forms.
Marking a symbol as a special variable just because some deeply-nested
piece of code happens to bind it with letf strikes me as a dangerous idea.

>> I'm not sure what problem it would solve either.
> 
> For non-variables, the semantics of `letf' is consistent with the
> general semantics of dynamic scoping, but not that of lexical scoping.
> E.g. in lexical scoping
> 
>   (funcall (let ((x 4)) (lambda () x)))
> 
> will always return 4.  But 
> 
>   (funcall (letf ((<PLACE> 4)) (lambda () <PLACE>)))
> 
> will usually not return 4 if <PLACE> is, say, `(car l)'.
> The semantics of `letf' on PLACE match the semantic of
> dynamically-scoped `let' on a variable.

I see your point. I'd agree with it if we were writing letf from scratch
for some reason. But there is existing code out there, and I don't want
to change its semantics. Anyone who needs a dynamic variable should be
declaring it in advance anyway.

>> If users need to do something like this, they can use letf on
>> (symbol-value variable)
> 
> Much less efficient

It doesn't have to be. There's no reason that calling letf on
(symbol-value variable) can't emit the usual variable-binding bytecode
instructions.  If symbol-value isn't the right generic place, then we
can come up with a different name.

>, and comes with all kinds of warts when the variable
> is buffer-local.  Better use a straightforward dynamically scoped `let'.

Sure.

>>>> +    (`(lambda ,arglist . ,body)
>>>> +     (macroexp-analyze-free-variables
>>> This form should never appear in macroexpanded code.
>> It can appear in macroexpanded subtrees.  Try (macroexpand-all '(setq
>> foo (lambda (x) (1+ x)))).
> 
> Then you have a bug somewhere: `lambda' is a macro, so a macroexpanded
> expression should never have the shape (lambda ...).

ELISP> (macroexpand-all '(defun foo (x) (lambda (y) (+ x y))))
(defalias 'foo
  #'(lambda
      (x)
      #'(lambda
          (y)
          (+ x y))))

ELISP> lexical-binding
t

Yes, it's actually (function (lambda ...)). So what? Why not handle both
cases?

>>>> +    (`(closure ,cells ,arglist . ,body)
>>> And this one shouldn't either.

Fine.  Since closure is an opaque list anyway, omitting this clause is
more reasonable than omitting the corresponding one for lambda.

>> Hrm --- what if somebody substitutes a function by value into a form,
>> and that function's value happens to be a closure form?
> 
> That's a bug in that somebody's code: (fboundp 'closure) => nil.
> 
>>> (defvar macroexp--sm-environment-tag
>>> (make-symbol "--macroexp--sm-environment-tag--")
>> Because if you hit C-M-x with point inside (say, to update the
>> docstring), you define macroexp--sm-environment-tag to a new value and
>> wreak havoc with any stored macro environments.
> 
> I prefer to say "don't do that" than to uglify the code the way you did.
> 
>> Using a keyword symbol instead of an uninterned one, since we'll never
>> have a function with a keyword-symbol name --- right?
> 
> Yup.
> 
>>>> +    (`(function (lambda . ,_))
>>>> +     ;; macroexpand-all has special logic to detect lambda in function
>>>> +     ;; position, so we need a special case here too.
>>> I don't understand the comment.
>> Without this clause,
> 
> This clause is very much needed (it is *the* clause that deals with
> `lambda', i.e. it is the normal case, not the "special case"), indeed.
> I just don't understand what the comment is trying to say.
> 
>>>> +(defconst macroexpand-already-expanded
>>>> +  (if (boundp 'macroexpand-already-expanded)
>>>> +      (symbol-value 'macroexpand-already-expanded)
>>>> +    (make-symbol "--macroexpand-already-expanded--"))
>>> Again: why not a defvar?  Also, we could define it as an interned
>>> well-known symbol.  E.g. macroexp--already-expanded.
>> In this case, I think an interned well-known symbol will work.  I'd
>> prefer making it a keyword symbol, though, in order to highlight its
>> special effects.
> 
> I'd rather stay away from using keywords as valid functions.

It needs to be a valid function in order ƒor the interpreter to work.

>> The macro environment should hold everything you need to expand a form
>> in a given lexical environment.  By making the correct interpretation of
>> the macro environment depend on a dynamic variable outside the
>> environment itself, you introduce the possibility of subtle error ---
>> you effectively give macro environments dynamic extent!
> 
> We already do that: macros like symbol-macrolet rely on
> the dynamically scoped macroexpand-all-environment.

If you _save_ that value and then apply it to macroexpand later, it
should work.  macroexpand-all-environment's dynamic scope is independent
of the extent of that variable's value.

> What other uses (besides symbol-macrolet) do you have for that
> macroexpand-environment-hook-tag?


The labels code looks scary. I wonder whether it might benefit as well.

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]