gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: Making microbranches popular


From: Scott Bronson
Subject: Re: [Gnu-arch-users] Re: Making microbranches popular
Date: Tue, 27 Jan 2004 15:39:57 -0800

On Tue, 2004-01-27 at 02:25, David Allouche wrote:
> On Tue, Jan 27, 2004 at 08:36:49PM +1100, Robert Collins wrote:
> > No. This problem is trying to shoehorn a square block into a round hole.
> Other than that, I agree that making the Arch namespace any more messy
> would be a bad thing. I'm pretty happy with both the version part and
> the anonymous branch feature, but I see no need to add up on it. Though
> I may change my mind if someone gives me a better argument than "my
> stupid lazy users who actually do not have a damn clue about arch think
> it would be better".

OK, I'll try.  Here's my anecdotal experience.  I apologize for the
length.  Skip to SUMMARY to see my conclusions without the process, or
just read the last paragraph to see why I'm very skeptical of these
unecessarily restrictive design decisions.


INTRO

I'm maintaining a small arm-based embedded Linux distribution.  It is
compiled from ~150 packages, but that will probably be over 300 by the
time it's ready to see the light of day.

Here's the general layout (extremely incomplete):

    /docs
    /src/common/corona
    /src/common/executor
    /src/common/initlog
    /src/common/vendor/uclibc
    /src/kernel/vendor/linux-2.4
    /src/kernel/vendor/linux-2.4-UML
    /src/kernel/vendor/linux-2.6
    /src/modules/exerciser
    /src/modules/mail-host
    /src/modules/mail-imap
    /src/modules/mail-web/vendor/omail-webmail
    /src/modules/cmdline/crueltools
    /src/modules/cmdline/vendor/busybox
    /toolchain/src/buildtools
    /toolchain/src/gcc-pyo    # heavily modified target arch, ancient
    /toolchain/src/vendor/binutils
    /toolchain/src/vendor/gcc-3.2
    /toolchain/src/vendor/gcc-3.3

This organization should be pretty clear...  The host-compiled source is
in /toolchain, the native-compiled source is in /src.  There's a lot of
native-compiled source, so it's further broken down by module with stuff
required by all modules in "common".  All vendor-supplied source goes in
a "vendor" directory so that the developer knows to keep his grubby
hands off -- if he wants to do any source changes, he needs to fork
first so he doesn't break other archs relying on that source.

It's obvious from this tree layout exactly what you need to check out. 
To work on executor, just check out /src/common/executor.  To work on
integration in the cmdline module, grab /src/common/ and
/src/modules/cmdline.  To boot the module on your workstation, grab
/src/kernel/vendor/linux-2.4-UML as well.  To make a final binary, grab
the appropriate toolchain.

Benefits to this layout...  Bringing new developers up to speed is a
snap.  Taking inventory is easy (What's in the mailhost module? ls
src/modules/mailhost!).  Developers can use each others' workstations
because the tree layout is the same.  Changes are very compartmentalized
so that merging really isn't an issue.


SUBVERSION  (parentheses, somewhat off-topic...)

At the start of the new year, I tried moving this project from CVS to
Subversion.  I chose Subversion because it looked like it would involve
significantly less work than moving to Arch.  And I was right --
importing the entire project took all of 20 minutes of my time and a few
hours of computer time.

I became pretty unhappy pretty quick though, mostly because of how slow
svn is on large trees, extremely poor utilization of disk space, and the
surprising difficulty of sharing the repository over the network (death
by chmod).  Their "cp makes a branch" feature was the deal killer though
-- the second time I accidentally checked out the whole tree and got 18
GB of 99% redundant data, I bailed.


ARCH

I decided to have a closer look at Arch to see if it would fix the
branch/merge, networking, and disk space issues.  Note that I don't care
at all about decentralized development -- centralized has been working
just fine for us so far.


Now, I ask myself, how can I best express this tree in Arch?  First I
tried creating different archives for the different parts of the tree:
address@hidden, address@hidden  This was obviously a bad
call.  Arch doesn't handle multiple simultaneous archives well (weird
errors, requires scattering my-default-archive and -A everywhere).  And,
even if the technical issues were fixed, it's pointless.  Spreading your
source among multiple archives doesn't actually _solve_ anything.

So, I wiped the archives and started over.  Now I'll to stuff everything
into the address@hidden archive.  But how?

Arch's version name just does not allow the same information content as
CVS or svn's path.  What should I name /toolchain/src/vendor/gcc-3.3? 
What part is the most important, "toolchain", "vendor", "gcc", or
"3.3"?  Well, the name of the program, right?  So, I happily started
checking in:

  executor--mainline--0.2
  gcc--vendor--3.2
  gcc--vendor--3.3
  ... and so on.

However, there are multiple copies of gcc--vendor--3.3 in the tree
(thanks to their weird idea of stability).  Arch won't allow
gcc--vendor--3.3.1 and gcc--vendor--3.3.2.  Crap.

OK, wipe and start over.  Put the version information in the branch. 
Ignore the version field entirely.  Do this with ALL package because, in
a project of this size, consistency is key.

  executor--mainline-0.2--1.0
  gcc--vendor-3.3.1--1.0
  gcc--vendor-3.3.2--1.0

Good, right?  The problem is, I now have a totally flat source tree. 
With >50 packages, this means that everything is listed alphabetically
by package name, the most arbitrary ordering possible!  Well, this
didn't work out.

So, wipe and start over.  I decided to force some hierarchy onto the
category field.  This is hard because Arch is very intolerant of
punctuation.

  toolchain-vendor--gcc-3.3.1--1.0
  src-common-vendor--linux-kernel-2.4-UML--1.0

You see what's coming, right?  What do you do about names that already
have hyphens in them?

  src-kernel-vendor--linux-2.4-mm-plus--1.0
  src-modules-mail-web--vendor-omail-webmail--1.0

I had already given up by this point (it was just too ugly to live
with), so I never tried to work around this.  Under CVS or Subversion,
of course, that would be:

  /src/modules/mail-web/vendor/omail-webmail


In parallel to this, it was taking a long time for me to modify my build
scripts to fit Arch's restrictive idea of what a name space is supposed
to be.  I didn't actually finish this either before giving up, but I can
tell you this, it sure wasn't going well.  :)  Had I finished, I could
talk more about how flattening the hierarchy forced some surprisingly
nontrivial changes to the build process.


SUMMARY

And this is where my failed attempt to move to Arch stands now.

tla abrowse and tla rbrowse output is lengthly and pretty much
incomprehensible.  Navigating the source is nigh impossible (it's
confusing to me, and I wrote most of it!)  Any inherent relationships
between packages totally lost.  The build process is more complex and
reliant on a number of custom scripts.  Everything requires
significantly more documentation to describe how it all fits together
(and nothing falls out of date like documentation).

The proliferation of Arch-related scripts is daunting.  I wrote scripts
to check out flat source into a hierarchy, I wrote scripts duplicating
tla build-config [2], I wrote scripts to automatically reassemble config
files after branching, I wrote scripts to import vendor trees[1] (thank
goodness for tla-update-ids), ...  Heck, I have a script to simulate
"cvs diff -r1 -r2".  :)  My setup is now so custom that I don't think an
Arch wizard could figure out how it works in a reasonable amount of
time, much less a greenhorn developer.  :)


FUTURE

After struggling with arch for a week, I'm back to CVS.  I don't think
I'm being clueless or lazy.  I hope somebody will correct me if I'm
missing anything obvious.  I'll admit that it took me a while to move
from CVS/SVN/Perforce-style thinking to Arch-style, and it's very
possible that I'm still fighting Arch instead of embracing it somehow.

I figure I'll check out svn again in a year or two when they've fixed
the branching and disk space issues (they've acknowledged both as
problems and are working on them).

I'm using Arch for my small hacks right now and I like it a lot.  But, I
don't think it will ever be useful for mid-sized development projects
like mine, or for projects that already have a decent amount of source
code in a traditional SCM.  The benefits of switching don't appear to
outweigh the grinding effort required.

So, that's what I did last week.  :)

    - Scott


[1] I wrote (well, started -- others have added good stuff to it) a wiki
page that describes importing vendor trees ('cvs import') in Arch:

http://wiki.sourcecontrol.net/moin.cgi/Tracking_20a_20project_20that_20doesn_27t_20use_20Arch


[2] Configs

I can't use configs because:
- configs are only 1-deep.  I can identify at least 3 levels in my
source tree.
- if you tla build-configs, and some sources are already checked out,
tla starts scattering ,,dupes around.  You can't keep multiple
simultaneous configs in flight.

I still hadn't given up on Arch so last week I wrote my own configs
command that overcomes these shortcomings.  And it didn't get me very
far.  The config files are large require a lot of maintenance.  If I
want to branch some sources, I need to ensure that *all* affected
configs files are updated.  Otherwise, I need to track down why binaries
on one architecture are showing different bugs/features than binaries on
another arch -- they were all supposedly compiled from the same source. 
"tla update" in the root directory doesn't just sort all this out the
way it does in CVS and svn.

My biggest problem with configs, though, is that there's no way to get a
feel for the organization of your project.  It takes a lot of reading
and diffing config files to see how the different components
interrelate.  This is the sort of information that is immediately
apparent when the source code is arranged in a hierarchy.

Configs is fine for flat projects with a small number of relatively
static source imports.  But it doesn't scale up and it doesn't fix the
fundamental problem of exactly how to organize your source.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]