[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] article cache size
From: |
Duncan |
Subject: |
Re: [Pan-users] article cache size |
Date: |
Sat, 9 Mar 2024 01:49:34 -0000 (UTC) |
User-agent: |
Pan/0.155 (Kherson; 020f52b16) |
David Chmelik posted on Fri, 8 Mar 2024 05:19:17 -0000 (UTC) as excerpted:
> On Tue, 30 Sep 2014 21:10:56 +0000 (UTC), Duncan wrote:
>> As to your question, years ago I was the person who asked to bump the
>> max cache size from 1 GiB -- I needed 4 GiB at the time and it was
>> bumped to 20, which was great.
>
> What size do you recommend if I currently use 1,500+ newsgroups, and
> some are binary but dead, so let's say all plain-text, but some are
> high- traffic like the Linux kernel listserv on gmane? I rarely read
> that; it's more out of curiosity. There's maybe under 40 I'd read daily
> if they have traffic, but many/most don't except occasionally/rarely,
> though usually there's something daily. Most are miscellaneous
> subjects, like computer science/engineering & software I just
> occasionally have questions on, like here, but other times don't keep up
> on, and just select and mark read.
Interesting/good question.
The discussion below gets a bit technical and arguably goes on a couple
tangents. Jump to the 4th paragraph from the end if you're just
interested in some recommendations. Read thru if you like technical and
find tangents interesting! =:^)
Primarily practical news-cache size depends on how you use pan and how
long you intend to retain messages.
Pan's cache-size default, way too small by my usage (text or binary, I
have separate instances for each), appears to be designed primarily for
either text-only with some short-term (a few sessions) caching or process-
as-you-go (not even a single full session) with anything above trivial
numbers of binaries.
My usage, instead, is archiving for text, and for binary, multi-session
sampling and download-interesting-to-cache first session, then go through
again when everything's cached so access is instant, to sort out what I
downloaded and either delete directly if I decide I don't want to save it
permanently after all, or sort and save off to permanent storage, then
delete from pan (which I believe deletes from cache).
While the default size would (for my usage) keep text around a few
sessions so I could refer back to messages if I wanted to see a full
message when it was context-quoted in the reply, it certainly wasn't
suitable for long-term "archiving" storage of any sort. For binaries it
was HORRIBLE, as I'd hit the cache-size limit and start deleting older
messages in the first session, before I did anything but read the
overview! I wasn't even reading downloaded messages before they were
deleted due to cache limits!
So you said basically text, (what I'd call *MANY* groups (1500), with some
high traffic and perhaps a few trivial binaries. Great. But how much do
you download to keep around even if you don't read it, and how long do you
actually want to KEEP it around?
Here, for text and trivial binaries (say "trim" for HTML messages level in
some text groups that allow them (the kernel group while high traffic does
NOT AFAIK), the occasional screenshot, etc), only a relatively few groups
but with near all traffic to them archived (unexpiring-cached) in some
cases since 2002...
Here's what compsize (transparent compression report for btrfs) says for
my text instance dedicated partition, basically the .pan directory but
mostly cache:
$$ sudo compsize /nt/
Processed 278330 files, 180543 regular extents (180543 refs), 99005
inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 47% 999M 2.0G 2.0G
none 100% 14M 14M 14M
zstd 47% 985M 2.0G 2.0G
(Compsize says the article cache itself is 970M used, 1.9G uncompressed,
so it is indeed most of the above. And the 14 M uncompressible in in the
cache, so I'll presume it's pre-compressed binaries sent yencoded, because
MIME/UUE encoding is inefficient/compressible)
So roughly 2 GiB uncompressed, compressed down to ~half size or ~ 1 gig
using zstd (level 3, default for btrfs if zstd compression is chosen)
compression. Only a trivial 14 MiB is uncompressable.
Here's the btrfs filesystem usage report for that partition, which is
btrfs raid1, so I can use half that 10 GiB total space:
Overall:
Device size: 10.00GiB
Device allocated: 3.06GiB
Device unallocated: 6.94GiB
Device missing: 0.00B
Device slack: 0.00B
Used: 2.37GiB
Free (estimated): 3.63GiB (min: 3.63GiB)
Free (statfs, df): 3.63GiB
Data ratio: 2.00
Metadata ratio: 2.00
Global reserve: 16.69MiB (used: 0.00B)
Multiple profiles: no
Data,RAID1: Size:1.00GiB, Used:853.71MiB (83.37%)
/dev/sdd8 1.00GiB
/dev/sdc8 1.00GiB
Metadata,RAID1: Size:512.00MiB, Used:357.91MiB (69.90%)
/dev/sdd8 512.00MiB
/dev/sdc8 512.00MiB
System,RAID1: Size:32.00MiB, Used:16.00KiB (0.05%)
/dev/sdd8 32.00MiB
/dev/sdc8 32.00MiB
Unallocated:
/dev/sdd8 3.47GiB
/dev/sdc8 3.47GiB
Now btrfs stores small files (2048 byte and under by default, which I use
here) in-line in the metadata, and some of those text-message cache files
will certainly qualify, thus explaining the difference between the
reported data usage of ~854 MiB here while compsize said 999 MiB -- some
of that 999 is stored in the metadata not data.
Total used including metadata is 2.37 gig but that's across both physical
devices so divide by two for raid1, ~1.2 gig of data+metadata. The 3.63
GiB reported Free pre-accounts for the raid1, including 3.47 GiB not
allocated (per device) plus the still unused space withing the data
chunks.
So of the 5 GiB effective space (5 gig per device but raid1 across two
devices), ~1.2 gig is used, ~3.6 gig is free, and the other ~0.2 gig is in
the unused metadata, system chunk, etc.
But if I wasn't using btrfs compression it'd be roughly half full. All in
all, pretty reasonable usage for a dedicated-usage partition where you
want some room to grow.
Finally, the pan cache for that: Again, set unexpiring (server settings)
so it effectively caches "forever", in prefs, size of article cache is set
to 5120 MiB = 5 GiB.
Which pan couldn't actually hit if I weren't using btrfs compression
because the filesystem itself is exactly 5 GiB, and there's metadata
overhead plus the non-article-cache files in the pan dir. But with
compression it should actually be able to hit that 5 GiB, and could
probably hit ~9 GiB or so, assuming the same near 2:1 compression ratio
continues. So I have room to set that higher as my archive continues to
grow...
Now a guess at translating that for you... Many more groups (say 100
times as many...), still mostly text, but presumably you aren't archiving
"forever", and if I've interpreted your description correctly, you
probably don't download as much of the groups as I do. However, at least
one of those groups is LKML (the kernel list), far higher traffic (if
enforced text-only) than anything I subscribe to and archive.
At a guess, I'd say start with a gig. That should reasonably safely
accommodate even your 100X the number of groups, text-mostly, for a
"reasonable" period of a month or so, which I'll say is about the max time
discussion threads are likely to be active so you can refer back to
previous articles without re-downloading, again assuming you're not
downloading everything in the group.
If you want to be extra safe or see messages you know you downloaded
disappearing (and your filesystems aren't going haywire due to crashing
and filesystem immaturity... btrfs is generally past that now but was
still a bit iffy when I started with it), double that to 2 GiB
(uncompressed), which again is roughly what I'm seeing with some groups
near-archived for 20+ years now, but at ~1% of the groups.
Even with ~1500 groups, text-mostly, downloading-to-cache near all
messages, I'd be quite surprised to see usage over 2 GiB with an effective
lifetime of under a month (even two), because that's simply *HARD* to do
with text-mostly groups ... *UNLESS* you're grabbing some prolifically AI-
spammed groups or something (the *HARD* to do assumes *humans* actually
writing all those messages -- two GiB of data is simply a LOT of text for
even a few hundred /humans/ to write over a couple months, but automate it
with AI and that assumption's out the window!)
If you're considering a dedicated partition, 5 gig for it should be good,
as it is for me.
If you're actually archiving those 1500 groups... I'd say start with 10
GiB, but until you have say a year of history to make a reasonable
projection into the future, watch the usage and consider the possibility
of having to adjust that up or being able to adjust it down, with a
dedicated partition if used similarly larger, maybe 20 or 25 gig. With a
year of history you should be able to project /reasonably/ comfortably the
usage out to storage replacement cycle lengths (double the year's activity
for a reasonable margin and multiply to cover your time until expected
upgrade, increase by 50% or double again for dedicated partition size if
used -- unless of course activity is multiplying, as it well could be on
groups with uncontrolled AI spam).
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman