guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile performance - Ackermann function: way slower than emacs, slowe


From: Ken Raeburn
Subject: Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled
Date: Wed, 5 Aug 2009 10:06:46 -0400

On Aug 5, 2009, at 05:06, Andy Wingo wrote:
Guile's VM does not yet have opcodes for +1 and and -1. It probably
should, though. Still, this is a fairly artificial test.

Yeah, and it turned out to be useless for what I had wanted to test at the start, but I was still curious to see how it would work out. Based on your analysis of the "while" issue, it turned out to be interesting. :-)

So, you're also timing Guile startup here, which takes about 20 ms.

Yes; except for the smallest cases, startup seems to be in the noise.

Poking at it a couple of times with Apple's process sampling tool, it
does seem to be spending a bit of time in garbage collection -- but
scm_leave_guile (called from scm_async_click via
scm_pthread_mutex_lock) and getrusage (called from
scm_c_get_internal_run_time via mytime) also seem to show up on the
stack a significant fraction of the time, and each of those appears to
be a comparable if not bigger time sink.

This could be simply that Mac OS has different performance
characteristics than Linux. If locking uncontended mutexen or getting
the resource usage of a process are expensive, that doesn't sound very
good...

From my experience with profiling code on another project, mutexes definitely appear more expensive on Mac OS X vs Linux/glibc when there's contention. I just ran a quick test on a couple of x86 machines, and it does appear that the Mac OS X version of pthread_mutex_lock+pthread_mutex_unlock takes half again as many cycles as the GNU/Linux version, without contention. (A test with 10000000 iterations of getrusage(RUSAGE_SELF) shows a more drastic difference between OSes -- in terms of cycle counts, more than 3.5x as expensive on Mac OS X as on GNU/Linux.)

But it's not pthread_mutex_lock that's showing up most often, it's sigaltstack and sigprocmask, called by scm_leave_guile. Some thoughts come to mind, but I don't have much time to work with them right now, maybe later:

(1) In scm_pthread_mutex_lock, we leave and re-enter guile mode so that we don't block the thread while in guile mode. But we could use pthread_mutex_trylock first, and avoid the costs scm_leave_guile seems to incur on the Mac. If we can't acquire the lock, it should return immediately, and then we can do the expensive, blocking version. A quick, hack version of this changed my run time for A(3,8) from 17.5s to 14.5s, saving about 17%; sigaltstack and sigprocmask are still in the picture, because they're called from scm_catch_with_pre_unwind_handler. I'll work up a nicer patch later.

(2) We're locking a global mutex to check if there's an async item pending for the current thread. First, why not a per-thread mutex so we don't have to potentially stop *all* threads just to see if the current one has async work to do? But second, it looks to me like by far the common case will be checking the async queue when it's empty -- not changing it, and not checking it when it's non-empty. If that's the case, can we use some other mechanism, perhaps atomic queue manipulation operations (like Apple's OSAtomic{Enqueue,Dequeue}) or atomic compare-and-swap, to speed up the common case? It introduces some portability problems, though, and if pthread_mutex_trylock helps enough maybe it doesn't matter.

(3) My four-year-old comments on scm_enter/leave_guile, recorded in threads.c around line 300, still stand.... Those functions really ought to go away. At least they're confined to one file, now. Some of it looks a little messy, but I can probably get rid of some of the uses....

(4) Yeah, yeah, I should run gprof already. :-)

Ken




reply via email to

[Prev in Thread] Current Thread [Next in Thread]