[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gnugo-devel] engine/influence.c (and DFA)
From: |
Arend Bayer |
Subject: |
Re: [gnugo-devel] engine/influence.c (and DFA) |
Date: |
Thu, 5 Sep 2002 20:41:28 +0200 (CEST) |
On 5 Sep 2002, Dave Denholm wrote:
> Gunnar Farneback <address@hidden> writes:
>
> > Dave wrote:
> > > I was just having a look at influence.c, since that is where the
> > > profile shows the most time being spent.
> >
> > On what platform and what exactly did you run? I just did a profile on
> > an AMD Athlon running neurogo.tst and then the influence code doesn't
> > take much time. There the pattern matching is in the top.
> >
> > % cumulative self self total
> > time seconds seconds calls s/call s/call name
> > 16.05 37.36 37.36 142878688 0.00 0.00 scan_for_patterns
> > 6.27 51.96 14.60 104763600 0.00 0.00 check_pattern_light
> > 5.08 63.79 11.83 179092 0.00 0.00
> > compute_primary_domains
> > 3.85 72.76 8.97 64742873 0.00 0.00 fastlib
> > 3.05 79.87 7.11 30039637 0.00 0.00 do_play_move
> > 2.94 86.72 6.85 97175 0.00 0.00 do_push_owl
(...)
> > 1.46 128.20 3.39 157757 0.00 0.00 accumulate_influence
> > [...]
> >
> > What we see here is probably that the neurogo test suite is rather owl
> > intensive. For a more representative profiling one should probably
> > replay a couple of entire games against varied opponents (self play
> > just doesn't suffice). The most straightforward way to do that would
> > be to have a GTP file simulating replay of a selection of games.
Also we see that indeed memory is becoming more of a bottle-neck (not
a surprise, of course): Also Athlon, but K6-2 400 MHz instead of
Gunnar's (1,6 GHz?):
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
10.88 111.74 111.74 136937536 0.00 0.00 scan_for_patterns
5.53 168.53 56.79 100769412 0.00 0.00 check_pattern_light
4.65 216.32 47.79 174590 0.00 0.00 compute_primary_domains
3.44 251.63 35.31 94773 0.00 0.00 do_push_owl
3.18 284.33 32.70 18600234 0.00 0.00 order_moves
3.17 316.87 32.54 63926889 0.00 0.00 fastlib
2.47 342.19 25.32 29608725 0.00 0.00 do_play_move
2.28 365.56 23.37 2783429 0.00 0.00
search_persistent_reading_cache
2.23 388.43 22.87 156910 0.00 0.00 accumulate_influence
2.18 410.77 22.34 17404893 0.00 0.00 assimilate_string
(...)
That makes me even wonder whether dfa is still a big speed up at all on
CPUs as fast as Gunnar's.
> But one new thought... do_push_owl() just pushes state ? But looking
> at the structure, there is a lot of state to remember.
>
> The other owl functions don't appear anywhere significant in the profile.
Well, compute_primary_domains is an owl function.
> So what I'm thinking is : maybe the balance between saving state
> and recalculating state is too far towards saving state. Maybe
> if we saved less state, or spent more effort saving only what we need,
> or made the owl stuff work slightly harder to get hold of its data,
> then we'd get an overall speedup by making do_push_owl faster ?
I recently realized that pushing the eye state (which is the biggest
part of sizeof(local_owl_data) is unnecessary, it is completely
recalculated from scratch, just didn't get around yet to do anything
about it yet.
However, I wouldn't be sure that this would be a big speedup. Note that
we cannot overwrite the current state (we do still need the information for
selecting the next move to try when we return from deeper recursion
leves). A side effect of do_push_owl might be that the memory segment
to which we copy the data is now in the CPU cache, where we need it
anyway.
(Sometime around 3.1.2x I killed pop_owl, which unnecessarily did the
same amount of copying as do_push_owl is doing now. It turned out to be
only a <1% speedup, although pop_owl had some 3-4% CPU time before.)
Instead, one could substantially reduce sizeof(local_owl_data) (and thus
both speedup do_push_owl and gain overall cache benefits) by
- maintaining eyes in a separate list, i.e. not keeping the eye
information copied at every
- further compressing the eye data (e.g. the 3bit information of the
eye color doesn't need a whole "int", etc.)
Just a matter of doing it.
Arend
Re: [gnugo-devel] engine/influence.c (and DFA), Gunnar Farneback, 2002/09/03
Re: [gnugo-devel] engine/influence.c (and DFA), bump, 2002/09/06
Re: [gnugo-devel] engine/influence.c (and DFA), Dave Denholm, 2002/09/06
Re: [gnugo-devel] engine/influence.c (and DFA), bump, 2002/09/06
Re: [gnugo-devel] engine/influence.c (and DFA), Dave Denholm, 2002/09/06
Re: [gnugo-devel] engine/influence.c (and DFA), bump, 2002/09/06
Re: [gnugo-devel] engine/influence.c (and DFA), Arend Bayer, 2002/09/12
Re: [gnugo-devel] engine/influence.c (and DFA), Dave Denholm, 2002/09/12