[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] archive storage format comments on the size

From: Andrea Arcangeli
Subject: Re: [Gnu-arch-users] archive storage format comments on the size
Date: Tue, 30 Sep 2003 18:46:27 +0200
User-agent: Mutt/1.4.1i

On Mon, Sep 29, 2003 at 06:45:03PM -0700, Tom Lord wrote:
>     > From: Andrea Arcangeli <address@hidden>
>     > On Mon, Sep 29, 2003 at 05:55:56PM -0700, Tom Lord wrote:
>     > >     > From: Andrea Arcangeli <address@hidden>
>     > >     > I forgot to tell one last important property of the [long highly
>     > >     > speculative design idea.]
>     > > That's ok.   Say less about these ideas in the first place, and we can
>     > > ask followup questions about the ones that seem interesting.
>     > Well, I thought it was easier to understand if I outlined the properties
>     > of the superpatch. For istance I even myself didn't think immediatly at
>     > the huge benefit it would generate during network transfer. That's why I
>     > sent a second email.
> I am trying to gently communicate that you don't really seem to have
> enough overall feel for arch and tla to justify going way off the deep
> end in terms of redesigning this or that, and that it's a bit
> premature to write in so much detail about your alternative design
> ideas.
> Please don't get me wrong.  For many of us, a good way to learn a
> system is to ask questions like ``Why is it like FOO?  Why isn't BAR
> instead?''.
> But my sense is that, in many many lines on the list, and generating
> many many lines of replies, you're not stopping at those questions but
> are instead eagerly evaluating ``BAR'' as far as you can.
> A case in point is optimizing for `tla get'.  You asserted that
> optimizing for any other operation in the archive format was
> unimportant, and then generated quite a bit about how to optimize for
> `get'.  We could have provided the most useful replies without those
> N+1 additional pages -- just the general idea of what you were
> thinking.
> The size of list traffic has gone up sharply recently.   The
> signal/noise ratio has fallen sharply.

well, rating my emails as noise isn't very gentle as you claim IMHO.

linux-2.5 has 12953 changesets and the working dir alone (not the
archive) will be 242M.

Current linux-2.5 archived in cvs is 413M. You must realize it's not
going to be handy at all to carry in my laptop a linux-2.5 archive many
times bigger than the cvs uncompressed one with 12953 tar.gz patchsets
and 12953 entries in a directory.

AFIK somebody created the linux arch archive with all the historical
patches and pre-patches, that's way too easy and it doesn't provide real
value to me, I need the _granular_ changesets, that's the whole point of
the bkcvs export. there are around 13000 to merge and I want all of
them, it's not that I can discard the old ones when it grows too big.
the feature for me is to have all of them at the same time in the same
archive so I can search back 2 years in the past.  I find very erratic
to name archives with a date infact. The hint it gives that it's good
practice to change archive once per year is flawed.

And doing 12953 checkins at this speed with those floods of lstat will
be very slow too (that's not once a time event, since I need to
regularly synchronize with mainline and there are many per day). It
takes a relevant time for a checkin just with 2500 changesets and a tiny
working dir (~2M or working dir, vs ~250M of linux-2.5 working dir).
Note that I'm running this in bleeding edge hardware, so there's not
much more of hardware I can throw at this (especially because I want to
run it on the laptop too).

those things aren't necessairly right, I think they can be fixed.
>From my point of view this is a brainstorming, talking to people, so
together we can discard bad options and choose better options and
improve. I was very glad to just find some of the features I need by
tweaking configure files.

Something like superpatches should help tremendously in reducing the
archive size, network fetch, and checkout speed and if I understood
correctly this is a novel idea - not noise.  I don't see why you
answered to the superpatchset idea with the "don't send noise" email. It
just doesn't make sense.

Which is the biggest project hosted by arch right now btw?

Andrea - If you prefer relying on open source software, check these links:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]