[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] New feature at the mirror + request for help
From: |
Tom Lord |
Subject: |
Re: [Gnu-arch-users] New feature at the mirror + request for help |
Date: |
Tue, 23 Mar 2004 08:52:23 -0800 (PST) |
> From: James Blackwell <address@hidden>
> > > This definitely works as an archive registry.
> > It's going to be even sweeter when this stuff starts to get worked
> > into `grab'.
> My gut reaction is "Hell ya!". I think, though, that we'd have to really
> this ramifications of doing this. This takes us straight to "centralized
> authorithy". Do not pass go, do not collect $200.
Thank you for worrying about that issue.
Authorities are not in and of themselves a problem, in my view.
They're very convenient.
What I would very much like to preserve, however, is:
~ flexibility
~ user choice
~ no fixed authority game
* flexibility
Allow a user to refer to more than one registration authority,
and to register archives "by hand". Allow competing lookup
protocols.
If you want to mix registrations from two authorities and add in
some archives not known to any authority, you should be able to
do that however you like.
* user choice
Core arch should never involve a configuration where there is
a per-installation, per-machine, or per-site authority. It
is up to each individual user to decide whether and how to
use each authority.
If your sysadmin or employer or project host makes bad choices about
where your archive registrations come from, you should be able
to override that.
* no fixed authority game
We might imagine some bright and glorious future in which
you and others have "competition" both as super-mirror and
registration authority.
Inevitably, in such a condition, people running such sites will
start to work out among them selves either explicit or de facto
peer2peer networks for sharing archive registrations, mirrors (and
perhaps other things that wind up in there such as public key info).
DNS is designed (for good enough reasons) to give its analogous p2p
network a tree shape with root nodes "owning" the entire namespace
and subtree nodes being authoritative for subsets of the namespace.
This is critical in DNS because namespace consistency is critical:
it is a disaster for the Internet when people irreconciably disagree
about the binding of a given name and different name servers take
sides in that disagreement.
The namespace consistency problem is important for arch -- but not
as critically important. It's a mild problem if somebody else
decides to name their public archive "address@hidden" -- but
it's not a disaster because users can either work-around the problem
without too much work or, simply ignore the namespace squatter.
(As a supermirror manager, you would need to make the choice between
the competing name claims --- acting as a proxy for your users.)
The namespace consistency problem is also far more technically
problematic for arch (i.e., very hard to solve) because of
mirroring. When is a mirror of my archive authoritative? How do
you know it is fully up-to-date and complete? To solve this, we'd
need more than a name authority -- we'd need an archive content
authority.
So, the namespace consistency problem isn't _essential_ to solve for
arch. But what would be gained or lost by solving it anyway
under, say, a rubric of "convenience"?
The DNS authority tree creates "territory" and, consequently, a
fight for ownership of property (e.g., over root servers). The
property in question, because property rights by definition involve
the right to "make productive use of" create a fight over what
owners of DNS territory can do with their region of the map.
In DNS the attendent melee spills out into the streets in the form
of fights throughout civil society over matters ranging from the
nature of ICANN in national and international law, the membership of
the ICANN board, the rights and responsibilities of registrars,
the path of IETF activities, the ownership rights and relation to
trademark of domain names, and on and on and on. And it's a
pitched battle for control of an immovable hilltop -- there's no
"just walking away from it".
Arch can be regarded as a collection of protocols and data formats.
Heck, in the future, that shouldn't be merely an abstraction, it
should be literally true --- formal standards should be drafted.
Where, among those standards, should registration authorities fit
in? What should be the nature of those protocols?
Arch can and should continue to gain features for picking up archive
registrations from some database that has conveniently assembled
them, but it ought to do so in a way that keeps archive registration
fundamentally anarchic and determined by end-users. In other
words, in these protocols, wherever an archive name is mentioned,
the rule that binds it to an archive location is determined
client-side, at the application level, in an unspecified way.
The arch protocols are (by design) fragile that way.
I don't claim "address@hidden" to the permanent exclusion of
others -- rather, by using that name, I suggest that I "own" it and
to the extent that users adopt my use of that name, that's the full
extent of my "ownership" of it. I "own" it only in the sense that
everyone who is using it has agreed that that name is mine to use.
There can be "fights" over the control of this name, but they won't
escalate beyond just this name, and they'll be pretty self-limiting
because the users can look at the fight and vote with their feet --
instant democratic resolution.
In contrast, a non-anarchic archive registration process would
recapitulate the experience of DNS on a smaller scale. It would
immediately define a territory and the rules of a fight for control
over that territory. That might be fun or lucrative, someday, but
it would add nothing at all to the practical utility of a revision
control system. It would just be a big distraction, a waste of
energy, and an obstacle and added expense for people wanting to
publish public archives. (Right now, for example, if you have some
FTP space or public web space, you can put up any archive you like
there. It would be a step backwards if, before you could do that,
you had to ask Savannah.gnu.org or a similar project host to
allocate an archive name for you. And will they have to _pay_ for
that allocation? Will the expense be passed along to you? Will you
be limited in how many names you can use? Will you have to repeat
the process each time you cycle your archive (or will you be
allocated a swath of names permitted to vary in suffix)? Who wants
to even think about those problems. Just avoid them.)
Don't get me wrong -- even an anarchic authority mechanism can
give rise to territory fights. We might all decide that your
super-mirror is the most important one and look to you to resolve
competing claims to a given name. You might decide, at some point,
that your super-mirror should be self-sustaining and possibly even
profitable and start charging for some kinds of transaction.
But the anarchic game is still better because, should you become
unreasonable, it's a trivial matter for someone to set up and
compete with you. It won't, since the authoritativeness of an
authority is always determined by the edge-nodes and not by an
intermediary, be hard to pull the rug out from under you. In
contrast, setting up competing DNS root servers can wreak a little
short-term havoc, but it will never really work as a strategy for
overthrowing ICANN because too much intermediate infrastructure
would need to make the switch simultaneously.
(Another difference with DNS: arch users can be reasonably expected
to understand all about archive registration; they're inherently
qualified to "vote with their feet". DNS users include everyone
using the Internet at all -- most people simply have insufficient
understanding to make an informed choice among competing DNS
services.)
> Give me just a moment... (putting on a nice suit to play devil's advocate)
> I could then dictate which archives are "grabbable" and which are not. For
> example, let's hypothesize that somebody, even temporarily, created a fork
> of tla that caused incompatible archives. My natural response to this
> would be to remove the archive from the supermirror.
Well, there you go. That's fine, imo. You'd be acting as a proxy for
those of us using your super-mirror. We'd still be voting with our
feet, but we'd be letting you move our feet for us. As long as you
did a reasonable job, we'd stay with you. If you got silly about it,
there'd be some heavy sighs and somebody else would put up a new
supermirror. We'd just recall our proxy.
> Maybe that's fine, and maybe that's not. What if I decide to make some
> sort of jba (James Blackwell's arch), which can read tla archives, but
> makes archives that tla can't read. I now certainly have the power to
> encourage my jba fork over tla-like forks, and might even have some
> leverage against the official tla.
Sure. That's mostly orthogonal to the namespace authority issue.
The right fix there is to formally standardize archive protocols and
data formats.
In theory, if you have a really good reason to make jba and make it
incompatible, that's not obviously a bad thing. Resolving that
divergence could lead to better standards.
(That said, I don't think I have or am likely to give you any reason
to do that. There's better ways to improve arch than starting up a
fight for power with me.)
> Ok. Maybe that's a bit of a reach. But how sure are you that I wouldn't
> refuse to mirror (which is how that registry is built) somebody's archive
> just because I don't like them? Sure, I'd like to *think* that I'm not so
> petty... but what if it were somebody really nasty. For example, what if
> somebody nasty came along... say the infamous Darl McBride... as far as
> anybody knows, I might consider giving him the black sheep treatment.
Word gets around. Either your users are comfortable with your
choices and your authority stands, or we get fed up and somebody sets
up an alternative.
Personally, I think you will at some point need to draw a line between
archives you mirror for free, as a public service, and those for which
you collect a well-deserved fee from a customer who can afford to pay
it.
> What actually makes more sense to me is me putting up a php script that
> lets people design a grab file that is served from sc or ga.o. That way
> they can specify configs, etc. Before we do that, probably there should be
> some sort of discussion about the namespace.
I'd appreciate it, I think, if there's some way I can write the spec
on my own and just hand that over rather than having to fill out a
form that's then compiled into the spec.
> Also, I kind of promised lifeless that I'd endorse the removal
> of grab once config-manager was ready. I think that point has
> gotten pretty close. He says that its not quite ready to
> supercede grab yet, but he's already worked out the one
> remaining thing before grab should be killed off.
What's in a name? I don't care whether "it" is called grab or cm or
rumplestiltskin -- I care only about what "it" does.
> >> No, I still don't have an arch browser yet. [....]
> >> Does anybody have a php based archive browser that requires building
> >> neither a library nor a cache?
> > I don't think a cache should scare you off in principle. On the
> > other hand, it should be a really well implemented cache both in terms
> > of its impact on you as admin and its smoothness in integration into
> > the code.
> For the sake of argument, lets say that building a cache at least involves
> finding every file in the tree. Today, that takes (cold):
> address@hidden:/var/www/mirrors/pub$ time find . -type f | wc -l
> 119897
> real 1m36.914s
> user 0m0.295s
> sys 0m2.323s
> By the time the supermirror is maxed out with about 2,000 archives, a find
> operation on the tree will take about 20 minutes to run and will bog
> down any other operation that is occuring at the same time. In all
> fairness, I suppose a cache building operation could be done only once a
> day....
That sounds like an analysis that illustrates how _not_ to design a
cache rather than a proof that caches won't work.
I'd expect that a useful _initial_ browser would let you see only data
from log messages, CONTINUATION files, and changesets. A "cache" in
this case would be some disk space where changesets are unpacked and
kept around for a little while so that if I browse around at the
changes to several files, the "unpack" step only happens approx. once.
Whole-file browsing and arbitrary-delta browsing is more complicated,
sure, but on-demand population of a fixed-size sparse revlib should do
the trick.
The thing is, a browser for a 1.2G collection of archives (or whatever
you're at) is not going to be dirt cheap, ever, under any system. And
that's just going to get worse as you add more archives. The arch
archive format makes trade-offs that are a big, big win for many
common operations -- and just horrid for browsers. Caching _will_ be
necessary and a browser just entering a "cold" part of the collection
is going to see some non-trivial latency while it warms up.
I think you need a revenue model to pay for all this. A tiny little
cluster with a truck-load of diskspace and some chubby pipes is ample
to scale up very far (as a guess: << $100k first-year and < $20K
ongoing) -- but that's just a bit out of volunteer/hobbiest range,
to put it mildly.
Let's get the kernel mirror going over the next few months and perhaps
some other projects (KDE? GCC? Gnome?) to increase demand for the
kinds of thing you're doing and, in the back room, start to work out
a revenue model.
-t
- [Gnu-arch-users] New feature at the mirror + request for help, James Blackwell, 2004/03/23
- Re: [Gnu-arch-users] New feature at the mirror + request for help, Tom Lord, 2004/03/23
- Re: [Gnu-arch-users] New feature at the mirror + request for help, James Blackwell, 2004/03/23
- Re: [Gnu-arch-users] New feature at the mirror + request for help, Dustin Sallings, 2004/03/23
- Re: [Gnu-arch-users] New feature at the mirror + request for help,
Tom Lord <=
- Re: [Gnu-arch-users] New feature at the mirror + request for help, Stephen J. Turnbull, 2004/03/26
- Re: [Gnu-arch-users] New feature at the mirror + request for help, Tom Lord, 2004/03/26
- Re: [Gnu-arch-users] New feature at the mirror + request for help, James Blackwell, 2004/03/27
- Re: [Gnu-arch-users] New feature at the mirror + request for help, Stephen J. Turnbull, 2004/03/29
- Re: [Gnu-arch-users] New feature at the mirror + request for help, Aaron Bentley, 2004/03/29
- Re: [Gnu-arch-users] New feature at the mirror + request for help, Evan Powers, 2004/03/29
- Re: [Gnu-arch-users] New feature at the mirror + request for help, Andrew Suffield, 2004/03/29
- Re: [Gnu-arch-users] New feature at the mirror + request for help, Tom Lord, 2004/03/29
- Re: [Gnu-arch-users] New feature at the mirror + request for help, James Blackwell, 2004/03/29
- Re: [Gnu-arch-users] New feature at the mirror + request for help, Tom Lord, 2004/03/30