Re: [Savannah-hackers-public] vcs0 disk filling up, /var/cache/cgit agai

From: Bob Proulx
Subject: Re: [Savannah-hackers-public] vcs0 disk filling up, /var/cache/cgit again
Date: Thu, 9 Mar 2017 22:52:46 -0700
User-agent: NeoMutt/20170113 (1.7.2)

Glenn Morris wrote:
> Bob Proulx wrote:
> > /var/cache/cgit directory.
> [...]
> > Some of the files are really quite large.
> >
> >   -rw------- 1 www-data www-data 513M Mar  9 02:40 f0110000
> Any idea what causes these enormous files to be produced?

Links to commit diffs.  And some of those commits can be very large.
That one I looked at it was a tremendously large commit diff.  In an
html display format.  All of those files are html getting thrown out
to the web browsers.

That one was a 512 meg html file for some client web query.  Probably
not a useful thing for a web browser and so I imagine the user didn't
wait for it.  That is a huge display page.  No information on whether
it was a web browser or a robot crawler but hopefully a person because
have the robots.txt file set to avoid crawling the /cgit/ directory.

I forget now which project that was for.  It think it was for emacs.
I already deleted the file.

> I see cgit has an option:
>   max-blob-size
>     Specifies the maximum size of a blob to display HTML for in
>     KBytes. Default value: "0" (limit disabled).
> Don't know if it's relevant.

That could be useful.  There are various tunings available in the
/etc/cgitrc file.  Assaf has already done some tuning.

  # 2017-feb-20,agn:
  # Reduce cache size to 5000,
  # see

  # 2017-feb-20,agn:
  # Set TTL even for static pages (those with fixed SHA1 in the URL),
  # Some of savannah's pages generate cache files of ~500MB.

Adding max-blob-size probably makes sense too.  Because I just don't
think it makes sense to view a 513M web page.

However I decided that instead of looking at cgitrc options that it
was just a lot simpler and more direct to use find and the hourly cron
to prune that directory.

Even with that agressive pruner I put in there are still a lot of
files in that directory.  They are now mostly small files however.

  address@hidden:~# ls /var/cache/cgit | wc -l

  address@hidden:~# du -sh /var/cache/cgit
  314M    /var/cache/cgit


