gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: situations where cached revisions are not so go


From: Jason McCarty
Subject: Re: [Gnu-arch-users] Re: situations where cached revisions are not so good
Date: Sun, 28 Sep 2003 16:19:46 -0400
User-agent: Mutt/1.5.4i

Tom Lord wrote:
> Perhaps storing sizes of tar-bundles in archives starts to make a lot
> of sense.
> 
> Let's suppose that while looking backwards I find a cached-rev and
> know its size.   Let's suppose that whenever I look at a summary delta
> or ordinary commit delta, I know _it's_ size.
> 
> Having found a cachedrev early, in the backwards search, I now have an
> upper bound (in bytes of tar bundles) on how much additional backwards
> searching to do looking for a changeset-path to ALMOST.

I like that idea; the limit can simply be passed as an argument and
decremented as it goes. If the traversal fails at some point
(unreachable archive), arch_build_revision will return an error, and
its caller will try the next-shortest delta, and finally the cachedrev
will be used if no working paths are found. This will waste some
bandwidth in server queries, so perhaps that should be added as another
penalty.

> I think the =locations flag could reasonably be just "local or
> remote?" and that's already there in the URL.

Not necessarily. The public mirror of my archive is accessed via sftp on
a LAN, which I would still consider local. OTOH, that doesn't really
matter since the transfer rate is so fast anyway. So this could be
acceptable.

> One way to do it is to say that the cost of a path is something like:
> 
>       10000 * kbs-to-transfer + n-changesets-to-apply
> 
> and pfs-fs.c (the local archive implementation) can (for this purpose)
> always report kbs-to-transfer as 0.
> 
> There's a need for a special case or two in there: finding a tag of an
> unregistered archive after having earlier found a path to an existing
> ALMOST should work, not error out; there's also a prediction to take
> advantage of: "If I've found a purely local path (but searching is
> still continuing) but the backwards search would take me to a remote
> archive -- assume that if the local path has a score < 10000 that its
> not even worth connecting to the remote archive".  Maybe that
> prediction can be generalized by just adding a penalty to the score
> whenever a new remote archive has to be connected.

Hmm, well if my algorithm found a local path, then it has already taken
it; it won't even bother looking down the continuation. If a summary
that crosses the continuation is really large due to the tagged version
being massively different, then the archive owner should probably store
a cachedrev there instead of a summary (although either will work).
Then we can just use the
cachedrev if connecting to the other archive fails.

>> As an initial approximation, _without_ taking the extra step of
> recording sizes in archives (and to work on archives that don't have
> them), could just assume that:
> 
>       sizeof of a simple-changeset == 1
>       sizeof of a summary-changeset == 8
>       sizeof of a cachedrev == 64
> 
> or somesuch.  And then add precise sizes at leisure.

Sounds reasonable, although I might use
    sizeof of a simple-changeset == 2
    sizeof of a summary-changeset == number of local revisions spanned
    sizeof of a cachedrev == 128

Since summaries look to be about half the size of the revisions they
summarize.

Jason




reply via email to

[Prev in Thread] Current Thread [Next in Thread]