[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fix m4_join
From: |
Eric Blake |
Subject: |
Re: Fix m4_join |
Date: |
Mon, 15 Oct 2007 19:45:10 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
Ralf Wildenhues <Ralf.Wildenhues <at> gmx.de> writes:
> Now it would be even better if m4_join were
> efficient in that it would not scale quadratically:
>
> m4_join([:
> ], m4_for([i], 1, 1000, [], [i,]))
Yes, some improvements from cubic to quadratic were due to bug fixes in
m4sugar - the rule of thumb here for optimal behavior is that when writing a
recursive algorithm, you should limit the parsing of $@ per macro invocation,
and you should completely output all text related to the first element of the
list before resuming recursion on the rest of the list. Although I don't see
evidence of m4_join being cubic, I do know for certain that m4_foreach was
cubic in autoconf 2.52, fixed 2002-03-04 for 2.53. There, prior to the fix,
each recursion of _m4_foreach resulted in more text around the previous
invocation, saving the final expansion until the recursion completed, and with
each iteration requiring one more m4_shift than the previous round. After the
fix, the first list element is completely processed before moving on to the
recursion for the rest of the list, with a constant number of m4_shift's per
round.
But to go from quadratic to linear, that's a different story. Here's where
m4sugar can no longer fix things, but where a fundamental improvement needs to
be made in M4 itself. It may be rather invasive, but the payoffs would be
tremendous for any use of address@hidden
The problem lies in the current m4 implementation of $@ - namely, it spends
time constructing a string consisting of every quoted argument, and pushes it
on the obstack, only to then reparse the obstack and turn it back into a list
of arguments for the next macro in the recursion, even though only the first 2
or 3 elements of the list even factor into the current iteration. In other
words, since iterating through n items results in n obstack growth/parse
cycles, you have quadratic behavior. Because of the rules of m4, there are
some cases where this MUST be done (for example, if the user does a changequote
in between, or if some of the arguments contain unbalanced quotes). But in the
common case, it would be much nicer if m4 would recognize $@ in a macro
expansion, and simply store a placeholder on the obstack pointing to the
previous list of parsed arguments. Most macros simply use accessor methods
that would expand the placeholder as it is encountered. But the ifelse, ifdef,
and shift builtins should be taught how to recognize the placeholder directly,
where the conditionals avoid the overhead of expanding the placeholder if it
occurred in an untaken branch, and where the shift merely adjusts where the
placeholder points to.
Unrelated, but also inherent in m4, is the fact that m4_append is quadratic -
every invocation concatenates the old definition with the new text, and mallocs
a new definition for the macro. That would be another nice idiom to optimize,
by making the defn builtin expand to a special placeholder. Most macros simply
use accessor methods that would expand the placeholder as it is encountered.
But the define and popdef builtins should recognize a placeholder followed by
new text as an append operation, then grow the malloc area in a geometric
fashion (if it isn't large enough already) without wasting time revisiting all
of the former definition (similar to the speedups possible when bash added the
+= variable assignment operator).
--
Eric Blake