[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of v

From: Ron Johnson
Subject: Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Date: Sat, 04 Jul 2009 12:17:45 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv: Gecko/20090103 Thunderbird/ Mnenhy/

On 2009-07-04 10:37, Duncan wrote:
Ron Johnson <address@hidden> posted
address@hidden, excerpted below, on  Fri, 03 Jul 2009 21:56:36

Also (and maybe because I'm a DBA), this problem just *screams* for
SQLite and a database in the "First Normal Form".

[ OK, this is a very long post, I know (tho I haven't counted the lines, 200? 250? More? I'll let pan show me that when I post and download it). But reading it and following even a few of the included tips should vastly improve your pan experience. =:^) Following all of them... well, that's up to you, but it works well for me! ]

Actually, before the C++ rewrite (the original was C coded) and the changes that allowed pan to scale to millions of headers/overviews per group from 100k, Charles' plan was, for quite some time, to eventually switch to just that, an sqlite backend.

I don't know why he didn't, except that in the 3-ish years during which pan seemed to be abandoned that we later learned he used at least part of to do the rewrite, several others (K. Haley I believe being one of them) began to experiment with pan, and some of those folks were database folks (I'm not sure if K. Haley is one of /them/). By the time Charles announced the C++ rewrite (aka new-pan, what we use now), there had actually been some preliminary numbers posted to the pan-devel list, and I think that by using some of the data management techniques that Charles /did/ use in new-pan, he actually got it to "reasonably" scale (now, it /does/ work when you throw even several million headers at it, with memory use scaling accordingly, before, 100k headers was bad, and above 200k, pan would literally sit there for days, not really increasing memory usage too badly, but just not getting anywhere -- it simply didn't scale at all above 200k headers or so, memory or no memory), and the numbers probably looked reasonably close to the preliminary database numbers as well -- at least close enough that he judged it not worth the trouble, with the clear benefit of plain text files.

But, meanwhile, for those dealing with those huge groups, there's some usage patterns that work rather better than others, and thus some usage patterns that users should avoid in the large groups, if they want a reasonably working pan.

# 1 most important, particularly since pan is a GNOME family app and as many Ubuntu users can attest, PAN AND THE GNOME ASSISTIVE TECHNOLOGIES APPLET DO NOT GET ALONG WELL AT ALL!!! When that applet is running, it apparently polls /something/ often enough to keep pan from making efficient progress at header sorting, in particular. What might otherwise take 30 seconds or maybe two minutes (still long enough), ends up taking half an hour... two hours... more... So if you're running that, do yourself a favor and at LEAST shut it off when running pan. Either that, or switch to something other than pan, as the two simply don't get along. For more details, see the list archives.

How do I tell if the GAT applet is running? (Using Debian, I don't *think* it is because I don't see it in the Tray, but want to be sure.)


Still, while that's the way that works best for me, it's obviously not everyone's style, or pan would default to downloading to cache, instead of the download and save default it currently has. But that's why I listed these three tips separately and marked them as distinctly optional. It does work well, but it's not for everybody. Meanwhile, if people just use tips 1-9, or even just 1 and 3 mainly, it'll likely improve their experience dramatically, even if they don't choose to do the whole separate pan instances, huge cache, download-to-cache, then go thru and save, thing.

Let's say I increase the cache, and then download to cache. How then do I "save *from* cache", converting from "yenc" to binary?

Also, would increasing the cache (and then politely restarting pan) profit me any if I already have a large number of articles in the "save queue"?

Scooty Puff, Sr
The Doom-Bringer

reply via email to

[Prev in Thread] Current Thread [Next in Thread]