[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Gnu-arch-users] GNU Arch review - am I accurate?

From: Parker, Ron
Subject: RE: [Gnu-arch-users] GNU Arch review - am I accurate?
Date: Tue, 9 Mar 2004 12:01:44 -0600

> From: David A. Wheeler [mailto:address@hidden
> Sent: Thursday, March 04, 2004 11:42 AM

> When I said:
> >> * Is anyone currently working on automated caching?
> Jan Hudec replied:
> > If you mean automatic creation of cached revisions, I have not
> >heared about it, except it's a trivial shell script to set as hook.
> What I want is for the tool _defaults_ to "do the right thing".
> E.G., "every X versions or when the patches are Y% of the previous
> cache/baseline, create a new cache, and delete the Zth
> old automatic cache since it probably isn't needed".
> Manual control over caching, and controlling various performance
> parameters, are useful.  But by default, arch should take advantage
> of the information it has available to it, and do the "right thing"
> automatically to give reasonable performance.
> I shouldn't have to set up hooks, or give caching commands, in
> the normal case just to do cache management... it should only
> be necessary for special circumstances.  The Linux kernel gives
> me all sorts of control over scheduling, but normally it
> divines the "right thing to do" and I don't have to touch it.

There has been some work on determining how many patches equal a cached
revision in term of various measurements of performance.  But I don't know
what has been done beyond this.

Unfortunately "do the right thing" typically suffers from a lack of
definition.  I hear it all the time from users, who upon supplying
sufficient definition and allowing time for development do not want what
they requested.  On occasion it may be a simple failure of communication,
but all too often they realize that what they asked for wasn't core to what
was bothering them and does not solve their actual problems.  A child may
want nothing but candy for lunch and supper, but they certainly do not want
the stomach ache that will follow.

You're asking for someone else to figure out what "the right thing" is, so
the end user doesn't have too.  This is understandable, but tuning a cache
of any type is a real art based upon many variables.  In this particular
case the largest variables from my perspective are available local drive
space, its performance and CPU speed (despite its speed, mine is all too
often overcommitted); network connectivity, bandwidth and available storage;
Internet connectivity, bandwidth and available storage.  I would suspect
this forms a non-linear system.  

I'll use my situation as an example of how it is difficult to "just do it".
(Note:  I do not actually update any remote archives from work despite the
hypothetical comments made about doing so below.)

It's not unusual for me to have 4 machines at my desktop.  No two of them
have the same performance characteristics.  

My main development machine is a desktop with two monitors, a nearly full
drive and overcommitted memory and CPU resources.  Adding drive space or
memory to this machine is not an option because of business issues.  When
the CPU on this machine isn't busy paging out memory, the drive is usually
the limiting factor due to constant thrashing.  At least once a week, I have
to find more room on the drive.  For this machine I cannot afford to cache
our source locally, it is several hundred MiB and it is not unusual for me
to have < 200MiB available.  Here, I am best off having archives cached on
the local network.  I should also mention that for this machine Internet
bandwidth is not a big problem at ~1500MiB/sec, so I typically don't bother
mirroring archives from the Internet, unless they are themselves
particularly slow.  And then, I would mirror them to our local network.
This machine has a "hole" through the firewall and can be used to write to
remote repositories.

My secondary machine is not nearly so burdened down and is similarly
configured.  It can afford to keep local archives and in fact having a
greedy-sparse revision library is ideal for this machine and improves its
performance.  I haven't really had to worry about pruning the library
either.  However, this machine lacks does not have a hole through our
firewall, so it could not be used to update a remote repository directly.
Instead the changes have to be mirrored to a local server and then pushed
from my primary machine.

I also have a laptop, it doesn't have a lot of spare disk space, but I can
maintain the necessary mirrors, caches and revision libraries to allow it to
work either fully attached or in detached mode.  Here it is not uncommon for
me to kill my local revision library for additional disk space when doing
multiple builds of our software.  (At least that seems the easiest route to

Also on occasion I have my Powerbook from home here as well.  It has
sufficient disk space that like my secondary desktop machine, I don't worry
about pruning my revlib or anything else.  But, like the work laptop it has
to be setup for detached usage.  It is also set up a little differently
because it may be fully detached from any network, or attached only to the
network at work with no access to my home network or attached to the network
at home with no access to the network at work.  At work, I only use it for
work and while at home I may push updates from it to a public archive on my
personal domain.

So, for me each of these machines is configured differently and I cannot see
an effective way for tla to "do the right thing" for each of them
automatically.  I have to instruct it in what to do.

Also I suspect that if I threw Tom's working situation into the mix, that he
would use caching much differently than I do.  As I understand it he works
on a lower powered system with only a dial up connection.  So his Internet
connectivity and bandwidth constraints differ greatly from my own and I bet
he has things tuned much differently than any of my machines.  From what
I've heard, it sounds like he maintains his setup manually.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]