[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-devel] newsrc with DB

From: Jeffrey Stedfast
Subject: Re: [Pan-devel] newsrc with DB
Date: Thu, 10 Jun 2004 09:16:15 -0400

On Thu, 2004-06-10 at 03:08, K. Haley wrote:
> Jeffrey Stedfast wrote:
> >I'm not much of a database guy...but...
> >
> >1. is a database really necessary?
> >
> >"as far as I've seen, once those database worms eat into your brain,
> >every thumb looks like a nail" -- jwz
> >
> >seriously tho. one should definitely read jwz's document on summary
> >files.
> >
> >Evolution uses jwz's approach to summary files and it is EXTREMELY
> >scalable. I have multi-gigabyte mbox files in Evolution right now and
> >you wouldn't know it based on load time. Heck, based on load time you
> >might expect my folders are 5 or 6 messages tops :-)
> >  
> >
> How many messages in one of those multi-gig mbox files?

~135,000 messages

>   We're looking 
> at handling >1 million articles per group.


>   I just read the article.  It 
> sounds like Pans current implementation with sumary files for each group 
> on each server.  If I understand the code correctly Pan also loads the 
> summary file into memory as jwz suggested.  There are two problems with 
> this.
> 1. With such a large article count the summary will be >100MB.  In 
> gourps with long subject lines, like binary groups, expect it to be 
>  >200MB.  Several users have seen memory usage well above that.  The 
> only effective solution here is to load the data only when it is needed.

fair enough.

> 2. What we really want is to combine the group lists for several 
> servers.  Say you have two news servers.  You open a group that is on 
> both servers.  Pan shows the combined article list from both servers.
> Solving both problems would basicaly mean writing a mini dbe so we might 
> as well use a small fast db like sqlite.  His suggestion about doing 
> lazy updates was a good one though.  Mark an article read, queue it, 
> start an update thread that waits for 30 sec before doing anything,  
> mark more articles read and add them to the queue.  When the thread runs 
> it updates all the articles in the queue.


> >2. if you use a database with a table that contains the message-id's of
> >the articles, PLEASE don't store the message-id as <address@hidden>, store it
> >as address@hidden - I say this for several reasons:
> >  
> >
> That's good to know.
> >ok, exception to #2 is that using the first 8 bytes of the md5sum'd
> >(canonical) message-id might be just as good. plus it saves a ton more
> >memory :-)
> >  
> >
> Good idea, although we might need to use the full 16 bytes just to be 
> safe.  Since the article table will hold all the article summaries for 
> all groups there will be more than a few users with several MILLION 
> entries in it.

yes, with millions of messages - might need the entire md5sum.


Jeffrey Stedfast <address@hidden>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]