I initially wrote the multithreading code, unfortunately I've forgotten a lot of the minor details :)
Most of this email thread is correct, in particular multi-threading the rollouts was what I did first and that does scale well. Note that the number of cores needs to be set to the logical core count (which is often double the physical core count as most modern cores are hyper-threaded) to max out the cpu - note though that this will likely only improve things by 20-30% as the hyper-threaded core isn't twice as quick.
The core count applies to all the multi-threading areas, e.g. the evaluation speed test (my laptop has 6 hyper-threaded cores):
1 thread: 177,000 (cpu 16%)
6 threads: 765
,000
(cpu 40%)
12 threads: 1173
,000
(cpu 70%)
The cache size does impact multi-threading as the cache causes most of the thread contention (as well as generally improving evaluation performance). Here's a rough rollout example:
zero cache
1 thread: ~645 seconds (cpu 8%)
6 thread: ~100 seconds (cpu 50%)
12 thread: ~80 seconds (cpu 100%)
max cache
1 thread: ~480 seconds (cpu 8%)
6 thread: ~90 seconds (cpu 50%)
12 thread: ~66 seconds (cpu 100%)
This would likely show up if you had a very high number of cores and in general I doubt there is much difference between a small-ish cache and a larger one.
I did add some more general multi-threading code to enable other areas to be multi-threaded and analysing games/matches does multi-thread. I think I stopped there as most other things are quick and/or difficult to split into tasks.
Jon