Did so, the results speak for themselves. The binaries tested were built
with all P4 whistles and bells enabled using the Intel 8.1 C Compiler. The
machine is a 865PE based Pentium 4C.
Gnubg-GUI-P4-NoHack-GTK261 : 21800 eval/s
Gnubg-GUI-P4-fXHack-GTK261 : 31500 eval/s
Gnubg-NOGUI-P4-NoHack-GTK261 : 27900 eval/s
Gnubg-GUI-386-StandardDistribution-GTK13 : 21400 eval/s
I also had the regular distribution binary and a P4 bolide both eval the
same 7pt game with 400+ moves in 2ply/prune, and it ended 7:30 min vs 8:20
min in favor of the standard distribution. That was without the fX hack,
which however should not be relevant for real crunching.
I can imagine two things to be the reason of that:
1. SSE2
SSE2 instructions fly if you manage to feed the floating point pipeline
steadily. Whenever you subroutine for any reason, the cache and the pipe
break and the speed gain is lost.
2. GTK 2.6
Whatever the event models of 1.3 and 2.6 are, GTK 2.6 clearly is slower
here. However, in general it appears to be faster (i.e. startup times etc.)
to me. So what options exist to remove GTK-Hooks from the actual evaluation
loops? It would be a shame to give 2.6 up because of this.