[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: situations where cached revisions are not so go

From: Tom Lord
Subject: Re: [Gnu-arch-users] Re: situations where cached revisions are not so good
Date: Sun, 28 Sep 2003 07:44:37 -0700 (PDT)

    > From: Robert Collins <address@hidden>

    > On Sun, 2003-09-28 at 15:56, Jason McCarty wrote:

    > > More complicated algorithms could certainly put forth lots more effort
    > > to follow the graph in the most efficient way, yes. I've chosen a simple
    > > algorithm to avoid doing complicated graph searches.

    > Huh? I don't follow. SPF is trivial: you need
    > a) a priority queue
    > b) a 'visited' test for each node
    > c) a path recording structure
    > d) a cost for traversing a link.

    > Zing!

I don't think it's a mystery to anyone that the build_rev problem can
be abstractly reduced to an instance of SPF.

The problem is that, in practical reality, it's an instance of SPF

a) the cost of examining the structure of the graph, including the
   weights on edges, varies wildly over different parts of the graph

b) on some parts of the graph, the costs of examining the graph is 
   are infinite

c) the costs of examining the graph are in some places high-enough
   that if we do too much of it, the optimal SPF solution is useless
   because we've spent too much time computing it

>From those reasons, it's not trivially SPF, it's a heuristic search
whose goal is to approximate SPF in the face of incomplete knowledge
and with the catch that asking the wrong questions during the
heuristic makes the heuristic worse than useless.  (Perhaps there is a
meta-problem that reduces to a different instance of SPF, sure...)

d) we haven't gotten into it much, yes, but _representing_ the graph
   raises questions of the physical representation of archives,
   maintaining that representation in a txnally-sound manner,
   minimizing the costs of querying the graph, and extending the
   "method table" that all archives must implement.

So, there's problems well beyond just the search itself.

    > For a cost metric I suggest something like:
    > number of links * RTT *(MAGIC1) + total KB * observed|configured
    > bandwidth *(MAGIC2) + projected local tree copies *(MAGIC3)

    > where magic 1 2 and 3 are to normalise the metrics.

KB?  observed|configure bandwidth?

Yes, with complete a priori knowledge of the graph the problem is,
indeed, quite trivial.

It's the lack of complete a priori knowledge and the costs of
acquiring knowledge about the graph that make it non-trivial.  And
it's the implications of representing those extra datums (kb,
bandwidth) that make it a larger design problem than just copying a
graph algorithm from a data structures text.

  > Add the nodes adjacent to TARGET to the queue. If a node is
  > already in the queue with a lower cost, throw away the adjacency
  > being added.  Once you've finished those adjacencies, start
  > popping the front of the queue off, and adding it's adjacent
  > nodes.

  > Rinse and repeat until, after finishing a nodes adjacencies you
  > have one or more successful paths.

One way to understand the thing Miles was kvetching about is to say
that your termination criteria there is wrong.   Having found just one
path, I then have to decide whether or not it is worth continuing to
look for a second one that might be far better.

One approach here is to say: 

        * It's A.I.!  Yay!

          Keeping adding stuff to archives, revlibs, pristines,
          and protocols until build_rev can be a really clever
          A.I. algorithm that does a great job 90% of the time.

          (Dumb servers may present a difficulty here, though.)

          Then, add an expert system that looks at network 
          topologies, generalizations from the user about space
          vs. time, and so forth -- and recommends an ideal
          configuration of mirrors, caching policies, and revlib
          mgt policy so that according to the Rob metric, 
          the A.I. gives an optimal answer 100% of the time.

Another is to say:

        * It's A.I.!  Boo!

          Keep the search butt simple, easy to understand,
          and easy to influence by changing the configuration
          of revlibs, mirrors, and cachedrevs.

          Make it easy for users to configure their environment
          in such a way that the butt simple search does a great
          job 90% of the time.

and then there's everything in-between, of course.

(Incidentally, storing the sizes of various tar bundles in archives is
one idea that's come around before and seems plausible enough.  The
design issues of (d) have to be delt with and, more to the point, it
would need to be shown that there's a really good reason to do it
since it wouldn't, in and of it self, instantly give a better
build_rev algorithm.)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]