[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] Pruning
From: |
Duncan |
Subject: |
Re: [Pan-users] Pruning |
Date: |
Mon, 22 Apr 2013 06:34:02 +0000 (UTC) |
User-agent: |
Pan/0.140 (Chocolate Salty Balls; GIT f3d4165 /usr/src/portage/src/egit-src/pan2) |
Beartooth posted on Sun, 21 Apr 2013 13:57:10 +0000 as excerpted:
> My .pan2 is running close to 400 MB, and I'm sure most of it is an aged
> accretion of cruft; I'd like to edit it down to somewhere between a
> tenth and a quarter of that. Is there an easy way, that will do no harm?
Well, "easy" is relative... and "do no harm" is relative as well, but
yes, there's a way.
FWIW, my pan text-instance directory (.pan2, except it's pointed
elsewhere, here) is a gigabyte, here, but that's because I deliberately
set no expiration on my various text groups and a multi-gig cache size,
so nothing expires in those groups. I have messages in some groups going
back years, some of them on servers or in groups that no longer publicly
exist.
What I'd recommend doing first is using a graphical tool such as filelight
or fsview (both kde tools but gnome and others probably have similar,
looks like pysize is one such more universal tool), opening it to your
~/.pan2 dir. These tools show a graphical representation of files and
(nested) directories by size, so it's dead easy to see what specific
files are taking up the most room, and just how much room they are taking
up as a percentage of the whole.
For instance, here, filelight tells and shows me that the article-cache
subdir is taking up 92% of all the space used by my pan text instance
data dir, 975 MB out of that gig I mentioned. The groups subdir is
taking up another 6% (67 MB), leaving 2% for the small stuff, but it's
the groups subdir that has the largest files, with the largest single
file being groups/gmane.linux.gentoo.devel , which is taking up 27+ MB on
its own, about 2% of that 1-gig total and nearly half of the groups
subdir, all by itself! The four biggest files following that are 5-7 MB
each, before they get too small for filelight to show them unless I dive
into the groups subdir itself, making it the working dir on which
percentages are based, etc.
It's thus immediately obvious that with the article-cache being 92% of
the total, if I wanted to reduce the total substantially, I'd *HAVE* to
shrink my article cache.
But of course as I said I'm not doing that here, as I'm effectively
archiving those articles in pan.
But the story for most people should be quite different. Pan's default
cache size is 10 MB, so unless you set pan's cache to something well over
the default, or unless there's a bug and pan's not deleting files when
the cache gets too big, if as you say your .pan2 dir is 400-ish MB,
deleting the entire 10 MB default cache won't do you much good.
Which is where the graphical filesize/directorysize tools help out, as it
becomes immediately obvious what's taking up the space, and you can then
either ask about that or simply do a backup, then delete the working copy
and see if its loss fits your idea of "do no harm", or not. (If you find
it harmful, you can simply restore from that backup you made before the
delete, thus my specific mention of the backup.)
Alternatively, here's a functional description of the various files and
subdirs and what they do, so you can figure out for yourself whether
losing that will be a big deal or not:
Subdirs:
article-cache: This is where pan stores the whole articles it has
downloaded. By default, this cache is limited to 10MB in size, so
articles will be relatively temporarily stored here. If you do primarily
text groups, 10 MB might be a few days to a couple months worth of
articles in cache. If you do primarily huge binaries, ISO images and the
like, obviously 10 MB won't hold much at all, just the parts pan's
downloading and assembling to decode and save right then. (The control
for cache size is in pan prefs, near the bottom of the behavior tab, in
the article-cache seciton. However, if your pan is old enough, you won't
have it there, and will have to edit it directly in preferences.xml using
a text editor.
article-drafts: This holds draft articles you saved before sending (and
with new enough pan, an autosave as well, but it gets reused with every
article you compose, so...). It could be quite big if you saved a bunch
of them and haven't cleaned it up recently, and thus might be a candidate
for cleaning. If there's lots of files in here, try ordering them by
date or size and deleting either the oldest or largest.
downloaded-attachments: AFAIK, this dir (if you have it at all) is an old
one that should be safe to delete as pan no longer uses it by default.
But to be sure, back it up before deleting, just in case.
encode-cache: This one's used by pan as temporary workspace for the
(relatively) new binary-upload feature. If your pan is too old to have
that, you shouldn't have this dir, either. But it should be empty or
nearly empty unless pan crashed in the middle of an encode step, as pan
should clean it out when its done. If it's not empty (and you're not in
the middle of a binary upload session), you should be able to delete the
files here without damage.
groups: This subdir IS IMPORTANT, as files within it contain pan's header
cache, one file per group. These files MAY get somewhat large -- as I
mentioned, that's where my largest individual files are located in my pan
text instance data dir, but as long as you don't do like me and set
unexpiring, they shouldn't grow without limit (unless you have filesystem
corruption or something). **HOWEVER**, they **MAY** be QUITE large for
the most active binary groups, particularly on servers with decent binary
retention (into the months or years). I'd not be surprised to see the
groups subdir files for active binary groups exceeding 100 MB in size,
unless you have expiry set short enough to counteract that.
But of course you CAN delete the groups subdir files for groups you no
longer visit and are no longer subscribed to, without issue, since
they're just wasting space...
Also of special mention is the relatively new Sent file, corresponding to
the "pseudogroup" within current pan. If you send a lot of messages,
this file could get pretty big over time.
It's worth noting that you can open these files in a text editor and look
around if you're curious. They are well commented at the top with an
explanation of what is there and its format. Just don't save any changes
unless you know what you are doing... or are prepared to lose the header
data for that group (maybe with a backup, just in case) if you screw up
the edit.
ssl_certs: This subdir will likely contain very small hash-data files
for each of your servers that you have configured to use SSL. However,
these should be small indeed, only a few bytes each.
(Of course it's worth noting that on many filesystems, a file takes space
in "blocksize" chunks, with "blocksize" often being either 1024 or 4092
bytes (1 or 4 KB). So these very small files, six bytes each here, will
normally still take 1024 or 4096 bytes of space on most filesystems
including ext*. Still, it'd take either a big bug or a *LOT* of
configured servers in ordered to make this dir big enough to get out of
the noise at all.)
That's the subdirs, here's the files appearing in .pan2 itself:
Score: scorefile. If yo use scores you don't want to delete this. If
you only assign scores using pan's GUI and you do it a lot, this file
could be pretty big, as pan's GUI isn't very efficient at storing the
scores it creates. It's possible to manually edit the file to make it
far more efficient, without losing any scores, but that's beyond the
scope of this message, and in any event I'd suggest that given past
history that you leave it alone unless it's getting to be a REAL problem,
because I know it's more complex than you're normally prepared to deal
with.
accels.txt: The "old-style" keyboard-accels file, possible but difficult
to hand-edit, as while it's a text-file, it's a machine-ordered menu dump
that has little/no human logic to it. AFAIK it's still honored if pan
finds it, but I believe pan prefers the pan.hotkeys file (new-style),
now, so it can probably be deleted without issue, if you have the new
file. (But as usual, if you've customized your hotkeys, make a backup
first before trying the delete, just in case.)
downloads.stats: This should be a small file consisting of a comment line
and a number, that number being the bytes downloaded since the last stats
reset. The file will only exist with newer pan, since the feature that
uses it is still relatively new.
group-preferences.xml: This file contains a record of most or all groups
you've visited, since doing so sets some group prefs for that group.
While you probably don't want to delete the file itself, as doing so
would delete all your group prefs, hand editing should be possible as
long as you're careful, and may be desirable, since you can remove
entries for groups you no longer visit and don't care to retain the
preferences for.
newsgroups.dsc: This file contains the newsgroup descriptions as
downloaded from your servers whenever you refresh the group list.
However, most groups don't have a good description anyway, so the
descriptions list is of limited value, and once you have your set of
subscribed groups and don't change them much or visit unsubscribed groups
much any more, this is a good deletion candidate. However, as mentioned
it'll probably reappear when you next update your group list again. But
of course if you seldom do that, since you already have your list of
subscribed groups and aren't generally interested in new ones anyway, the
file might stay gone for quite some time.
newsgroups.xov: IMPORTANT! This file contains a record of the groups
you've visited and a per-server listing of the highest article number pan
knows about for each group. Thus, you don't want to disturb the entries
for groups you actively visit. However, the format is simple enough, one
group per line, that you can delete whole lines for groups that you're no
longer interested in, if you want.
newsgroups.ynm: Semi-important: This file tracks per-group posting
permissions: posting allowed (default/y), not allowed/read-only (n), or a
moderated group (m). I believe pan rebuilds this file when you update
the group list, so it's not irreplaceable, but you don't want to go
randomly deleting it either, as pan could then get quite mixed up if you
try to post to a moderated or read-only group, until you do update the
group list again.
newsrc*: IMPORTANT! There should be one of these files per server.
They track read messages. If a newsrc file for a server goeUnvisiteds
missing, pan will lose this information and will show all messages on
that server as unread once again (tho it's actually a bit more complex
than that, since the read status from multiple servers carrying the same
groups interact).
It's possible to manually edit the newsrc files without /too/ much
trouble if you're careful. Unsubscribed groups will have an exclamation
point (!) appended, while subscribed groups will have a colon (:)
appended. If you've visited the group, there will be a space, and the
article numbers for that group and server that you have marked as read.
Some people may be interested in removing the tracking for groups they no
longer visit, by removing the space and number sequence.
pan.hotkeys: This is the new-style keyboard-accels file. It's easier to
hand-edit if desired as there's comments and it's actually logically
ordered, but changing the assignment in pan prefs is preferred. If you
have custom keyboard-accels configured you'll want to keep this file, but
you might consider removing accels.txt, above, if you have both. This
new-style version is relatively recent, however, so older pan
installations may not have this file, only the older one.
posting.xml: IMPORTANT! This file contains your posting profiles.
Obviously you don't want to remove it unless you don't care about them,
but it's reasonably easy to hand-edit, if you're careful not to break the
xml. However, it should remain reasonably sized unless you go hog wild
with hundreds/thousands of profiles.
preferences.xml: IMPORTANT! This file contains pan's general preferences
including the cache size preference mentioned above. It's reasonably
easy to edit as long as you're careful not to break the XML. This file
should remain pretty close to the same size (near 9 KB) always, tho
individual changes will change it by few bytes.
servers.xml: IMPORTANT! This file contains your server configuration.
Again, it's reasonably easy to edit as long as you don't break the XML,
and indeed, hand-editing this file is the only way to get some settings.
(It's possible to set an arbitrary server rank here, for instance, while
pan's GUI is limited to primary and backup. Similarly, per-server expiry
can be set to an arbitrary number of days, instead of the far more
limited options the GUI gives you. Finally, it's possible to set an
arbitrary number of connections that pan will try to use if the server
allows it here, while due to GNKSA, the GUI limits the maximum number of
connections to 4. That can be useful for paid accounts that allow 20,
30, 50... connections, altho once you get into the double-digits, unless
you're lucky enough to have a gigabit link to the internet, it becomes
increasingly likely that more connections simply increase overhead and
thus slow you down, instead of increasing download speed. Again, this
file should remain reasonably small, unless you go hog wild configuring
hundreds/thousands of servers...
tasks.nzb: This is a standard *.nzb file, containing pan's list of
uncompleted downloads. (It only stores Message-IDs, not group refresh
task data, which I believe is lost when pan exits.) The file will thus
be larger when you have a long list of downloads queued up, but should
shrink to pretty small (just the standard nzb xml schema info, basically,
194 bytes, here) when there aren't any articles queued for download.
I'd certainly investigate any files or subdirs other than those listed
above, since they're likely to be from something other than pan...
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman