[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: give us a hand with arch

From: Andrea Arcangeli
Subject: [Gnu-arch-users] Re: give us a hand with arch
Date: Fri, 26 Sep 2003 02:25:02 +0200
User-agent: Mutt/1.4.1i

On Fri, Sep 26, 2003 at 12:41:01AM +0200, Pau Aliagas wrote:
> On Thu, 25 Sep 2003, Andrea Arcangeli wrote:
> > On Thu, Sep 25, 2003 at 06:29:13PM +0200, Pau Aliagas wrote:
> Hi Andrea, welcome to the list.

thanks ;).

> I'll post my answer here to have more eyballing.
> > actually I learn using arch just last night. Not absolutely everything,
> > but in enough detail to undertsand the design and how it works
> > internally.
> > 
> > the major problem at the moment seems speed
> You should use a fairly recent version to have the inode signature 
> optimization. The src.rpm I sent you already had it, try building it and 
> the speed improvement will be 10:1

ok, however I didn't receive any src.rpm yet as far as I can see (maybe
it's an email sitll in flight because bigger).

> Well, we have "express commits": tla commit -L "your explanations".
> Combining both like CVS is just unimportant because you could have an 
> alias in your shell to do it. Once you get used it's evenbetter to have it 
> this way because you can fill in the log as you do the changes. See, it 
> makes you work better.

the alias would work. I just thought a real command would be worthwhile.
I'm not sure if I'll ever write the comment while writing the patch.
That's not the way I work normally.

> > I feel like the whole database in the archive would better be rewritten
> > to boost performance and to compact it. But the design looks promising,
> > much better than cvs, so as a starting point looks definitely
> > interesting as you suggested.
> Well, it's simply a filesystem properly organized and that's part of its 
> beauty :) Maybe in a reiserfs would be faster than on ext2/ext3 or, as you 

yes, in reiserfs should be faster.

> Reading your initial impediments, I'm almost sure that the difficulties 
> you are finding are more due to being unfamiliar with arch than to real 
> problems.

I hope so ;)

> > However I've a number of problems for using it even in 2.4-aa, that is
> > simply it's fundamental to me to maintain the changesets intact. If I
> > want to make a change to a changeset, I've to change it, not to make the
> > modification at the end like in cvs. I mean, I've to change the
> > patch-2001. And ideally I would like to name each patchset with a more
> > meaningful name (I know there's the summary so that may not be too bad).
> That's damn easy with arch! I'll forward you privately an email posted by
> Tom called "prims-merging technique" (you can look for it in the archives 
> too). To make it short, you can create sevral branches to work on several 
> features, merge them selectively to the trunk (this would be you linux-aa) 
> and from here export your consolidated changeset, made of patches from all 
> your private-feature branches. That way you develop randomly n your 
> private features but export "complete changesets". Read the complete email 
> I'll send.

So basically I will have 1 branch for each commit, that will certainly
work but I thought it was simpler to be able to have the patchset reject
into its tree, than to handle the reject across trees.

Especially I still need the "virtual tag" (again s/hook/tag/), into a
unpacked tree. Otherwise when I checkout I need to unpack 200M. That's
way overkill, and caching wouldn't be useful either. 

I mean, I must be able to do a:

        tla get linux--aa--2.4

and to get out the tree in seconds, I don't like to wait 200M to be
written on disk before I get it. That would slowdown me too much. As
said I'm currently down to a dozen of seconds by using /dev/shm with my
current scripts to fixup a reject (that in this case means merging from
a branch with a reject in between).

> Naming patches is not possible, but you have summary + keywords + log 
> associated to each one. Much more expressive than a simple name.

Yes. Though if the patch names would be named that would be a natural
API to implement a command that extracts all the patchsets into a
directory, plus a .ordering file that lists the order where they should
be applied. That would be basically the explort from arch to 'patch'.

Then you can do:

        for i in `cat .ordering`; do patch -p1 < $i; done

and reach the tree too.

For patchsets seen as fundamental "features" and not the 'checkin of the
20030926 it makes sense to me to name the patch too.

> > But exactly because it's purerely driven by changesets, it sounds very
> > reasonable to use it even for 2.4-aa (with hundred patches that will
> > never get merged).
> I keep several GPL projects synchonised with my private changes and keep 
> my unmerged pathes in my private branch. It's just getting used on ow to 
> do it with arch. My technique is as follows:
> -create a project of the original source (in this case linux--linus)
> -create a project of your branch tagged from the original:
>  tla tag linux--linus--2.4 linux--aa--2.4
> -apply your changes to your branch
> -keep linus tree updated mergin in changes (from cvs, release patches, 
>  etc...)
> -merge its changes into your tree and solve conflicts

yes, that's fine. And my point is that I would like to tag in a fresh
_unpacked_ tar.gz or it takes ages to untar it multiple times for no
good reason, and it wastes memory, inodecache and pagecache to replicate
the same data.

I will do some experiment with it.

> -In your branch youl¡ll always have the pending patches ready to export to 
>  the current kernel, effortlessly ;)
> > The checkout isn't really different from my script
> > tha applyes each patch in order. However arch compared to my script
> > sounds a bit overcomplex, just for this purpose.
> Not at all. Read the prism-merging development technique mail and see.
> It's only a matter of disciplined development adapted to the beauty of
> arch power.

I will try ;)

> > Also I can't upload 200m every time, so I need to tell arch to tag
> > against an unpacked source somewhere in the tree and I must be able to
> > change the tag at the top of the tree without affecting the changesets.
> > This isn't possible yet if I understood well.
> I don't quite get it. You should never download or upload 200 Mb but the 
> first time. The steps should be more or less the following:
> -create your own archive (repository)
> -create original linux project (name it linux-kernel--linus--2.4)
> -import source into it (preferibly from the CVS tree if you want fine 
>  grained detail, using of the existing tools, like Miles' arch-tools)
> This original linux tree on arch is CRUCIAL as evreybody will keep on
> getting changesets from it. We now have on linux archive, but made from
> tars and release patches, not from the CVS, so it lacks the level of
> detail you may need. This tree should ideally be mirrored in,
> for the availability and bandwith, and people who'd like to work on linux
> using arch would ideally mirror it locally. Instead of downloading patches 
> they'd mirror again.
> -create your branch tagged from a revision of linux--linus tree.
> -patch the tree carefully, feature by feature, so that you can export each 
>  one separately
> -as time goes by, you'll synchronise with the original CVS (the bk->cvs) 
>  using Miles' arch tools, so that everything is automated.
> -you'll replay those new patches in the original tree in your linux--aa, 
>  probably with little intervention. Today Tom has posted a mail explaining 
>  how to avoid conflicts when merging. There are several techniques and 
>  each one is useful depending on the level of interference among patches.

so basically you're saying that I won't need to regenerate the tree some
hundred times like I'm doing right now ;), and so in turn I won't care
anymore about the speed of the checkout. maybe I'm biased about the
absolute need of quick clones not involving an 200M unpack, after having
cloned trees thousand and thousand of times ;)

Now a more difficult question, what when I need to fix a bug in
patch-10? (not in the tree, fixing the tree is easy, just edit and
commit, but the fix would go in say patch-20 separate from patch-10)

> > We'll see, but it certainly sounds just very reasonable, with multiple
> > people working on the project and merging *regularly* like bitkeeper.
> > For the *almost never merge* case it maybe more problematic.
> I don't see any problem. If people want patches from your tree, they can 
> replay your changes one by one, synchronise using star-merge... it feels 
> natural.

well this sounds all surprisingly very good ;)

> If people wnat read-only access, they can mirror to have fast access. This 
> is even less problematic.
> > I also feel it needs some sort of naming enforcment, basically I would
> > enforce the inventory explicit model, by simply avoiding arch to look at
> > any file that isn't explicitly invetoried (showing '?' like CVS). that
> > sounds much safer when there's lots of junk in the tree and there's no
> > risk to forget an explicit rename. The automatic inventory embedded in
> > the files is too deep.
> Miles' arch-tools already do this automagically from CVS ;) I'm sure he
> can give you a hand. We have everything almost ready, we only need someone 
> motivated enough to import the kernel CVS.
> I prefer tagline naming method, but as you say that's too deep by now. 

yes, changing all files is a no-way, it would reject on future changes
too, for example when they bumb the copyright.

> Explicit tagging is by far the best choice.


I also would like a way to *enforce* it, I would like that the commit
would ignore everything without a tag (maybe it just work that way, so
far I only tried the names, and the names one commits everything
including garbage). being used to CVS that was quite surprising and I'm
very concerned garbage would end in the tree then. It doesn't sound
robust at all, to let the checkin grab everything. Of course w/o
explicit tagging there is no way to know what is supposed to go in or
not, so the beahviour I experienced is accetable with names, but with
explicit tagging turned on, then people is supposed to do
renames/additions/removal with metadata update, and as such we can be
strict. I like being strict. With a tree like the kernel with that many
developers eventually garbage will get it. A few times garbage got into
my patches too (once I applied as usual with -p1 and it had to be -p0,
so I had a linux/char garbage directory ;) A strict checkin it would fix
it. I also would like that the inventory would show me the non
explicitly-tagged garbage similar to the ? in cvs (but wait, maybe the
lint functionality already does that? it told me nothing but I was in
names mode)

> > Oh, another way is to write a LD_PRELOAD library
> > to trap mv/rm/create, but that sounds very dirty. The ideal would be to
> > have streams in the fs.
> If you mean it for importing from the CVS, arch-tools already does its 
> beest ot detect renaming and mv. No need for anything special.

not sure I understand correctly, the only thing I compiled and installed
so far is tla of the last stable release. I followed the 'files' link in
the homepage. where are the arch-tools? the cvs converter I used was the
cvs2arch night hack from wave++.

> You can export changsets to patches directly doing tla get-patch. That'd 
> do the trick :)

oh that's the fundamental command I couldn't find ;)

> This signature will grow RSN.

if the arch exports works well enough, after we've something like cvsps
for arch, I may prefer to shrink it to one line only too. We'll see. the
2.4 version is the most challenging from my point of view, since my 2.4
tree is quite huge and hard to maintain.

One last issue, is there a way to give a symbolic tag to files? The way
I understood it, there isn't and  I've to create a second branch and tag
to the previous branch. Is that correct?

In cvs people tends to use tags for important events like releases,
and losing them during the conversion would be bad. Now the night hack
didn't care about tags at all, I wonder if there's a way to retain the
tags. I guess the simplest way would be to be able to tell the converter
from cvs to arch, to only merge only from tag1 to tag2, and then you do
the next branch by hand and you merge tag2 to HEAD into it. That should
be good enough for most purposes. And with arch we can create a branch
with a more recent version number instead of the tag. BTW, is there any
difference between creating a branch with a different name, or with a
differen version number? The way I understood it arch doesn't care about
the name, it only cares on what's inside the directory, so I could still
tag by name by creating a branch with a different name and same version,
if I want to.

thank you very much!

Andrea - If you prefer relying on open source software, check these links:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]