gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] New feature at the mirror + request for help


From: Tom Lord
Subject: Re: [Gnu-arch-users] New feature at the mirror + request for help
Date: Tue, 23 Mar 2004 08:52:23 -0800 (PST)

    > From: James Blackwell <address@hidden>

    > >     > This definitely works as an archive registry. 

    > > It's going to be even sweeter when this stuff starts to get worked
    > > into `grab'.

    > My gut reaction is "Hell ya!". I think, though, that we'd have to really
    > this ramifications of doing this. This takes us straight to "centralized
    > authorithy". Do not pass go, do not collect $200.

Thank you for worrying about that issue.

Authorities are not in and of themselves a problem, in my view.
They're very convenient.

What I would very much like to preserve, however, is:

        ~ flexibility
        ~ user choice
        ~ no fixed authority game

* flexibility 

  Allow a user to refer to more than one registration authority,
  and to register archives "by hand".  Allow competing lookup 
  protocols.

  If you want to mix registrations from two authorities and add in
  some archives not known to any authority, you should be able to
  do that however you like.


* user choice

  Core arch should never involve a configuration where there is
  a per-installation, per-machine, or per-site authority.   It
  is up to each individual user to decide whether and how to 
  use each authority.

  If your sysadmin or employer or project host makes bad choices about
  where your archive registrations come from, you should be able
  to override that.


* no fixed authority game

  We might imagine some bright and glorious future in which 
  you and others have "competition" both as super-mirror and
  registration authority.

  Inevitably, in such a condition, people running such sites will
  start to work out among them selves either explicit or de facto
  peer2peer networks for sharing archive registrations, mirrors (and 
  perhaps other things that wind up in there such as public key info).

  DNS is designed (for good enough reasons) to give its analogous p2p
  network a tree shape with root nodes "owning" the entire namespace
  and subtree nodes being authoritative for subsets of the namespace.
  This is critical in DNS because namespace consistency is critical:
  it is a disaster for the Internet when people irreconciably disagree
  about the binding of a given name and different name servers take
  sides in that disagreement.

  The namespace consistency problem is important for arch -- but not 
  as critically important.   It's a mild problem if somebody else 
  decides to name their public archive "address@hidden" -- but 
  it's not a disaster because users can either work-around the problem
  without too much work or, simply ignore the namespace squatter.
  (As a supermirror manager, you would need to make the choice between
  the competing name claims --- acting as a proxy for your users.)

  The namespace consistency problem is also far more technically
  problematic for arch (i.e., very hard to solve) because of
  mirroring.  When is a mirror of my archive authoritative?  How do
  you know it is fully up-to-date and complete?  To solve this, we'd
  need more than a name authority -- we'd need an archive content
  authority.

  So, the namespace consistency problem isn't _essential_ to solve for
  arch.    But what would be gained or lost by solving it anyway
  under, say, a rubric of "convenience"?

  The DNS authority tree creates "territory" and, consequently, a
  fight for ownership of property (e.g., over root servers).  The
  property in question, because property rights by definition involve
  the right to "make productive use of" create a fight over what
  owners of DNS territory can do with their region of the map.

  In DNS the attendent melee spills out into the streets in the form
  of fights throughout civil society over matters ranging from the
  nature of ICANN in national and international law, the membership of
  the ICANN board, the rights and responsibilities of registrars,
  the path of IETF activities, the ownership rights and relation to
  trademark of domain names, and on and on and on.   And it's a
  pitched battle for control of an immovable hilltop -- there's no
  "just walking away from it".

  Arch can be regarded as a collection of protocols and data formats.
  Heck, in the future, that shouldn't be merely an abstraction, it
  should be literally true --- formal standards should be drafted.

  Where, among those standards, should registration authorities fit
  in?   What should be the nature of those protocols?

  Arch can and should continue to gain features for picking up archive
  registrations from some database that has conveniently assembled
  them, but it ought to do so in a way that keeps archive registration
  fundamentally anarchic and determined by end-users.   In other
  words, in these protocols, wherever an archive name is mentioned,
  the rule that binds it to an archive location is determined
  client-side, at the application level, in an unspecified way.
  The arch protocols are (by design) fragile that way.

  I don't claim "address@hidden" to the permanent exclusion of
  others -- rather, by using that name, I suggest that I "own" it and
  to the extent that users adopt my use of that name, that's the full
  extent of my "ownership" of it.   I "own" it only in the sense that
  everyone who is using it has agreed that that name is mine to use.
  There can be "fights" over the control of this name, but they won't
  escalate beyond just this name, and they'll be pretty self-limiting
  because the users can look at the fight and vote with their feet --
  instant democratic resolution.

  In contrast, a non-anarchic archive registration process would
  recapitulate the experience of DNS on a smaller scale.  It would
  immediately define a territory and the rules of a fight for control
  over that territory.  That might be fun or lucrative, someday, but
  it would add nothing at all to the practical utility of a revision
  control system.  It would just be a big distraction, a waste of
  energy, and an obstacle and added expense for people wanting to
  publish public archives.  (Right now, for example, if you have some
  FTP space or public web space, you can put up any archive you like
  there.  It would be a step backwards if, before you could do that,
  you had to ask Savannah.gnu.org or a similar project host to
  allocate an archive name for you.  And will they have to _pay_ for
  that allocation?  Will the expense be passed along to you?  Will you
  be limited in how many names you can use?  Will you have to repeat
  the process each time you cycle your archive (or will you be
  allocated a swath of names permitted to vary in suffix)?  Who wants
  to even think about those problems.  Just avoid them.)

  Don't get me wrong -- even an anarchic authority mechanism can 
  give rise to territory fights.    We might all decide that your
  super-mirror is the most important one and look to you to resolve 
  competing claims to a given name.   You might decide, at some point, 
  that your super-mirror should be self-sustaining and possibly even
  profitable and start charging for some kinds of transaction.

  But the anarchic game is still better because, should you become
  unreasonable, it's a trivial matter for someone to set up and
  compete with you.   It won't, since the authoritativeness of an
  authority is always determined by the edge-nodes and not by an
  intermediary, be hard to pull the rug out from under you.   In
  contrast, setting up competing DNS root servers can wreak a little
  short-term havoc, but it will never really work as a strategy for 
  overthrowing ICANN because too much intermediate infrastructure 
  would need to make the switch simultaneously.

  (Another difference with DNS: arch users can be reasonably expected
  to understand all about archive registration; they're inherently
  qualified to "vote with their feet".  DNS users include everyone
  using the Internet at all -- most people simply have insufficient
  understanding to make an informed choice among competing DNS
  services.)


    > Give me just a moment... (putting on a nice suit to play devil's advocate)

    > I could then dictate which archives are "grabbable" and which are not. For
    > example, let's hypothesize that somebody, even temporarily, created a fork
    > of tla that caused incompatible archives. My natural response to this
    > would be to remove the archive from the supermirror.

Well, there you go.  That's fine, imo.  You'd be acting as a proxy for
those of us using your super-mirror.  We'd still be voting with our
feet, but we'd be letting you move our feet for us.  As long as you
did a reasonable job, we'd stay with you.  If you got silly about it,
there'd be some heavy sighs and somebody else would put up a new
supermirror.   We'd just recall our proxy.


    > Maybe that's fine, and maybe that's not. What if I decide to make some
    > sort of jba (James Blackwell's arch), which can read tla archives, but
    > makes archives that tla can't read. I now certainly have the power to
    > encourage my jba fork over tla-like forks, and might even have some
    > leverage against the official tla.

Sure.   That's mostly orthogonal to the namespace authority issue.

The right fix there is to formally standardize archive protocols and
data formats.

In theory, if you have a really good reason to make jba and make it
incompatible, that's not obviously a bad thing.   Resolving that
divergence could lead to better standards.

(That said, I don't think I have or am likely to give you any reason
to do that.   There's better ways to improve arch than starting up a 
fight for power with me.)


    > Ok. Maybe that's a bit of a reach. But how sure are you that I wouldn't
    > refuse to mirror (which is how that registry is built) somebody's archive
    > just because I don't like them? Sure, I'd like to *think* that I'm not so
    > petty... but what if it were somebody really nasty. For example, what if
    > somebody nasty came along... say the infamous Darl McBride... as far as
    > anybody knows, I might consider giving him the black sheep treatment.

Word gets around.   Either your users are comfortable with your
choices and your authority stands, or we get fed up and somebody sets
up an alternative.

Personally, I think you will at some point need to draw a line between
archives you mirror for free, as a public service, and those for which
you collect a well-deserved fee from a customer who can afford to pay
it.


    > What actually makes more sense to me is me putting up a php script that
    > lets people design a grab file that is served from sc or ga.o. That way
    > they can specify configs, etc. Before we do that, probably there should be
    > some sort of discussion about the namespace.

I'd appreciate it, I think, if there's some way I can write the spec
on my own and just hand that over rather than having to fill out a 
form that's then compiled into the spec.


    > Also, I kind of promised lifeless that I'd endorse the removal
    > of grab once config-manager was ready. I think that point has
    > gotten pretty close.  He says that its not quite ready to
    > supercede grab yet, but he's already worked out the one
    > remaining thing before grab should be killed off.

What's in a name?  I don't care whether "it" is called grab or cm or
rumplestiltskin -- I care only about what "it" does.



    > >> No, I still don't have an arch browser yet. [....]
    > >> Does anybody have a php based archive browser that requires building
    > >> neither a library nor a cache?

    > > I don't think a cache should scare you off in principle.   On the
    > > other hand, it should be a really well implemented cache both in terms
    > > of its impact on you as admin and its smoothness in integration into
    > > the code.

    > For the sake of argument, lets say that building a cache at least involves
    > finding every file in the tree. Today, that takes (cold):

    > address@hidden:/var/www/mirrors/pub$ time find . -type f | wc -l
    > 119897

    > real    1m36.914s
    > user    0m0.295s
    > sys     0m2.323s

    > By the time the supermirror is maxed out with about 2,000 archives, a find
    > operation on the tree will take about 20 minutes to run and will bog
    > down any other operation that is occuring at the same time. In all
    > fairness, I suppose a cache building operation could be done only once a
    > day....

That sounds like an analysis that illustrates how _not_ to design a
cache rather than a proof that caches won't work.

I'd expect that a useful _initial_ browser would let you see only data 
from log messages, CONTINUATION files, and changesets.   A "cache" in
this case would be some disk space where changesets are unpacked and
kept around for a little while so that if I browse around at the
changes to several files, the "unpack" step only happens approx. once.

Whole-file browsing and arbitrary-delta browsing is more complicated,
sure, but on-demand population of a fixed-size sparse revlib should do
the trick.

The thing is, a browser for a 1.2G collection of archives (or whatever
you're at) is not going to be dirt cheap, ever, under any system.  And
that's just going to get worse as you add more archives.  The arch
archive format makes trade-offs that are a big, big win for many
common operations -- and just horrid for browsers.  Caching _will_ be
necessary and a browser just entering a "cold" part of the collection
is going to see some non-trivial latency while it warms up.

I think you need a revenue model to pay for all this.  A tiny little
cluster with a truck-load of diskspace and some chubby pipes is ample
to scale up very far (as a guess: << $100k first-year and < $20K
ongoing) -- but that's just a bit out of volunteer/hobbiest range,
to put it mildly.

Let's get the kernel mirror going over the next few months and perhaps
some other projects (KDE?  GCC?  Gnome?) to increase demand for the
kinds of thing you're doing and, in the back room, start to work out
a revenue model.

-t






reply via email to

[Prev in Thread] Current Thread [Next in Thread]