Re: Move to git is imminent - awaiting Stefan's approval

From: Bob Proulx
Subject: Re: Move to git is imminent - awaiting Stefan's approval
Date: Sun, 12 Jan 2014 14:13:28 -0700
Andreas Schwab wrote:
> Bob Proulx <address@hidden> writes:
> > Three hours and 3G of memory for this one repack!  I worry that
> > turning --agressive on globally will cause the server to fall over.
> IMHO --aggressive is too aggressive, and not useful for on-going
> maintenance.  Incresing pack.window to 50 or so should already help
> enough to improve the packing.

It sounds like I should schedule a full agressive repack once.  Once
being the operative word.

  git repack -a -d -f --window=250 --depth=250

And then as you suggest above ensure that the regularly routine
ongoing maintenance 'git gc' runs with a large enough pack window.

I see that the git-gc default is --window=250 so at that point nothing
more should need to be done.  The normal 'git gc' routine maintenance
should be okay at that point.  Is that right?  Seems like it.  We can
tweak this further as needed.

Certain times of day the vcs system is quite heavily loaded.  The vcs
system is a VM on a dom0 also hosting many other VMs.  At some times
of day the entire dom0 is very heavily I/O limited.  This has been an
ongoing problem and discussion.  It is definitely an ongoing problem.
I will run this during a less busy dom0 time.

I spent some time researching this problem and found this note and
some information in the entire thread useful and interesting.  It is a
long thread but there are some good gems in there.


> > If an agressively repacked repository is again repacked but this time
> > without the --agressive option does the size stay around 327M or does
> > it get expanded on the subsequent pass?
> Unless you run repack with -f (ie. gc --aggresive) existing deltas are
> reused, and only newly added objects are deltified.

Sounds good.  I will note that -f forces a full (--no-reuse-delta)
repack of everything and should be avoided.

> > (Wondering if we can periodically run 'git gc --agressive' on the
> > larger git repositories at a niced background task priority but not
> > too often and still achieve a good benefit for the time between
> > agressive repacks.)
> Another option is to touch a <pack>.keep file for the largest pack so
> that it is never touched again.  New objects will then be added to a
> separate pack even after git gc.  If that large pack is already well
> packed this should save some processing time.

That seems like a useful additional tweak for a large stage such as
this.  If nothing else it will help out the backup since that file
won't be changing on a routine basis and will remain static for the
purposes of backup transfer.  This could be applied to several of the
large repositories.

It seems that on the client side after a new clone that this tweak is
not propagated.  It seems that if there are multiple packs on the
server side that they are combined into a single pack file.  But
without being repacked.  Therefore downstream clients that wish this
would need to do it manually after a clone.


