[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] [PATCH][5] numbers integration

From: Alex Shinn
Subject: Re: [Chicken-hackers] [PATCH][5] numbers integration
Date: Fri, 13 Feb 2015 10:16:05 +0900

On Fri, Feb 13, 2015 at 7:39 AM, Peter Bex <address@hidden> wrote:
On Wed, Feb 11, 2015 at 02:14:52PM +0100, Peter Bex wrote:
> [long story about way too complex performance hacks]
> I'd have to seriously think about this and whether it's feasible at all.
> It may not be worth it, due to the disadvantages I mentioned.

Good news, everyone!

I've taught the toaster to feel love! :)

I had another look at the compiled C output for some of the worst
offenders in the chicken-benchmarks set.  I noticed that the
scrutinizer misses a few opportunities to rewrite + into ##sys#+-2
due to fact that the rewrite was  ((number number) (##sys#+-2 #(1) #(2))).
That meant if the scrutinizer was unable to deduce that the arguments
were both numbers, it would not rewrite at all, and fall back to the
vararg version of "+" for which it needs to cons up a rest list and so on.

But ##sys#+-2 is a generic procedure which needs to analyze its arguments
anyway, and it signals a condition when it receives non-numeric arguments.
So the rewrite can be changed to ((* *) (##sys#+-2 #(1) #(2))) without
this changing anything.

It turns out that the overhead of the generic vararg numeric operators is
so much bigger that the total runtime of all the benchmarks with this
small change in rewrites is only 30% slower than the non-numbers version
of CHICKEN 5 master, instead of 100% slower!  See the attached performance
difference between chicken-5 master and the current numbers integration
version.  In fact, a few benchmarks are even slightly faster with the new
code (but I can't really explain that)!

4x slower for nestedloop sounds about right.  All of the time
is taken in the innermost loop which is basically:

(let loop ((i 1) (result result))
  (if (> i n)
      (loop (+ i 1) (+ result 1))))

This which will change from a simple goto to a general
procedure call, so that's the worst-case slowdown.

I know to use fx ops in these cases, and with high-level
loops like foof-loop we can automatically use these for
string/vector indices, but I suspect many people will want
smarter default optimizations.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]