[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-devel] newsrc with DB

From: K. Haley
Subject: Re: [Pan-devel] newsrc with DB
Date: Thu, 10 Jun 2004 01:08:37 -0600
User-agent: Mozilla Thunderbird 0.6 (Windows/20040502)

Jeffrey Stedfast wrote:

I'm not much of a database guy...but...

1. is a database really necessary?

"as far as I've seen, once those database worms eat into your brain,
every thumb looks like a nail" -- jwz

seriously tho. one should definitely read jwz's document on summary

Evolution uses jwz's approach to summary files and it is EXTREMELY
scalable. I have multi-gigabyte mbox files in Evolution right now and
you wouldn't know it based on load time. Heck, based on load time you
might expect my folders are 5 or 6 messages tops :-)
How many messages in one of those multi-gig mbox files? We're looking at handling >1 million articles per group. I just read the article. It sounds like Pans current implementation with sumary files for each group on each server. If I understand the code correctly Pan also loads the summary file into memory as jwz suggested. There are two problems with this.

1. With such a large article count the summary will be >100MB. In gourps with long subject lines, like binary groups, expect it to be >200MB. Several users have seen memory usage well above that. The only effective solution here is to load the data only when it is needed.

2. What we really want is to combine the group lists for several servers. Say you have two news servers. You open a group that is on both servers. Pan shows the combined article list from both servers.

Solving both problems would basicaly mean writing a mini dbe so we might as well use a small fast db like sqlite. His suggestion about doing lazy updates was a good one though. Mark an article read, queue it, start an update thread that waits for 30 sec before doing anything, mark more articles read and add them to the queue. When the thread runs it updates all the articles in the queue.

2. if you use a database with a table that contains the message-id's of
the articles, PLEASE don't store the message-id as <address@hidden>, store it
as address@hidden - I say this for several reasons:
That's good to know.

ok, exception to #2 is that using the first 8 bytes of the md5sum'd
(canonical) message-id might be just as good. plus it saves a ton more
memory :-)
Good idea, although we might need to use the full 16 bytes just to be safe. Since the article table will hold all the article summaries for all groups there will be more than a few users with several MILLION entries in it.

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]