[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] the state of the union

From: Tom Lord
Subject: Re: [Gnu-arch-users] the state of the union
Date: Thu, 19 Aug 2004 14:34:07 -0700 (PDT)

    > From: Greg Hudson <address@hidden>

    >> I was comparing [...]

    > Perhaps that's what you were comparing in your head, but what you wrote
    > [...]

I disagree but it it isn't important.... more interesting stuff:

    > If your real point is that Subversion took a wrong turn by using BDB, I
    > won't argue with you.  (In fact, in "undiagnosing", I very explicitly
    > agreed with you on that point.)  If your real point is that Subversion's
    > FS design is harder to implement, it's possible that you're right.  But
    > you didn't appear to be talking about implementation details.

I think that the largest points I'm trying to express are basically
political and economic ones.  We have these N-1 free software revctl
projects, they've almsot all been around for a few years now and, I
don't know about you but my sense is that *I* understand the design
space much better now than a few years ago and also that many other
people understand the design space much better now than a few years

At the very least the nature of inter-project discussions should
evolve in some way because of that (and look, here we are!).  My
preference is that it evolve in a direction characterized by the
explicit intention of participants to achieve the most efficient use
of volunteer resources (private and commercial) overall, rather than 
for the exclusive political and/or economic benefit of just one
project.   Which is easy to say, of course, and hard to practice.  For
example, in my "state of the union", I suggested reasons why, for such
cooperation, arch is the best framework;  that's not necessarily going
to be a popular idea with other projects.

    > > and avoiding local filesystem things like `flock'

    > "flock" isn't at all restricted to local filesystems, although there
    > certainly exist remote filesystems which can't swing it.  But this is
    > definitely within the realm of implementation details.

I went through about 3 or 4 words before settling on "local".  I
decided that what it means is a filesystem that has the semantics of a
native unix filesystem (to some degree that we could, in principle,
dicker over).  A virtual filesystem built over FTP could never pass
for "local", for example; if we accept NFS' weakened semantics for
rename, create, and unlink, then certain NFS setups can pass for
local.  It's not a precise term.  I'm just saying (with some nuance)
the fairly tautological: if you aren't using the filesystem in hairy
ways you don't need a particularly hairy filesystem.   Which seems to
be what you were saying, too.

    > > One thing I noticed while skimming the FSFS design document
    > > ("structure") is that some of the files in your back-end are
    > > indefinately mutable (the one that caught my eye was something about
    > > "revision properties", I believe).

    > > Mutable files like that complicate replication, backups, and integrity
    > > checking, at least.

    > The rev-prop files are the only indefinitely mutable component.  There's
    > no requirement that rev-props be mutable for Subversion to work (in
    > fact, by default Subversion prohibits changing of rev-props after a
    > commit); if you don't allow it, you can't do things like fix mistakes in
    > a commit log after the fact.  

Why not make commit logs versioned objects?  That way there is no need
for mutability and the system overall is more purely an archiver.

    > Rev-prop files are tiny, so they don't really complicate
    > replication and backups.  Since arch doesn't have an equivalent
    > concept, the rev-prop files can be ignored for the purposes of
    > this discussion.

They aren't any big deal, sure.   You could just make them versioned
objects.  In the arch world, that would mean (as a strawman example)
that for every revsion:


there is a related version (aka "development line" or "versioned


into which log message updates are committed (if you can see what I'm
pointing at with the name mangling in those examples).

    >> One virtue of arch's approach is that the core archive is, in essense,
    >> a (partially ordered) transaction journal and nothing more.   Each
    >> commit-like operation bundles up the parameters of its transaction,
    >> stores that bundle in the archive --- and that's it, the commit is
    >> done.

    > In FSFS as well, a commit is finalized by bundling up the transaction
    > directory into a file and storing that in the revs directory, after
    > which time the file never changes.

That's good and sounds very familiar (because, after all, there's only
so many ways to do it).

I wonder if you could comment on the "meta-data structure" of a svn
repository, things like rev ids and copy ids and so forth.  How much
of that do you feel is essential to the general idea of txnal
filesystem and how much of it do you think is ...mmm... "baggage" of
some sort or another.... perhaps just carry-overs from the table
designs that came out of BDB-FS?  Why is or isn't it a good structure
to build around?  What's your take on inventory id tags?

    >> Client-side caches and memos are a flexible solution that scales
    >> arbitrarily with the number of clients.

    > Perhaps.

    > Here's something I do often: I find a bug in one of the dozens of
    > upstream programs I maintain builds of, and I narrow it down to a
    > particular source file.  My first question is "is there an upstream
    > fix?"  So I ask the upstream CVS repository for a log (or in some cases
    > a blame annotation) of the upstream file.  I'm not going to have a
    > well-populated client-side cache for the given program.  The upstream
    > repository would probably rather not serve me the project's entire
    > version history, and I certainly would rather not have my client pore
    > through that entire history.

I think aaron mentioned already that the meta-data you are interested
in is already factored out into separate files in arch -- you client
could just use that.

As for not wanting to be served the entire version history of these
projects, I can't imagine why not.   For the upstream servers,
updating your mirror of their archive (at least of their mainline)
gives you complete information in an economical number of bits: their
expenses are modest and predictable if most clients keep mirrors.
For you, I don't know what you mean by "pore through that entire
history".  Do you mean something like grep a few hundred files?
What's the problem here?   Are you concerned about space usage?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]