pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] 64 bits fails as solution to large binary groups


From: Duncan
Subject: Re: [Pan-users] 64 bits fails as solution to large binary groups
Date: Tue, 11 Oct 2011 05:33:58 +0000 (UTC)
User-agent: Pan/0.135 (Tomorrow I'll Wake Up and Scald Myself with Tea; GIT 8e43cc5 branch-master)

Ron Johnson posted on Mon, 10 Oct 2011 08:19:51 -0500 as excerpted:

> On 10/10/2011 04:55 AM, Duncan wrote:
>> Ron Johnson posted on Sun, 09 Oct 2011 22:15:24 -0500 as excerpted:
>>
> [snip]
>>
>> The biggest problem is pan's assumption that it has all the information
>> necessary to maintain its threading structure in memory at all times. 
>> In ordered to really allow pan to become a disk-based client, to store
>> most of that info on disk and only read in a rather smaller limited
>> working set at once, pan really needs some sort of header indexing or
>> at least hashing system devised, such that it can figure out what info
>> it needs to read in from disk, from a vastly larger on-disk store, in
>> ordered to work with and properly thread, at a minimum, the currently
>> displayed article headers, likely plus some pages from the article list
>> before and after the currently displayed set.
>>
>>
> Two years ago, I suggested using SQLite.  "No!!!  It sucks over NFS!!"
> and "Use 64 bits" were the responses.

Umm.. could you point out the threads?  (Just name them by date, subject 
and preferably original thread-start poster.  I have the list archive 
going back to 2002 for both lists, from gmane, with user I believe all 
cached locally as well.)

AFAIK, for it to have had an effect, it would have had to have been at 
least three years ago, likely four or five, by now, because the last 
major rewrite was for 0.90, announced... yes, over five years ago now, 
April 2, 2006.  Since there hasn't been a major rework of the code since 
then, any suggestion to use SQLite would have necessarily been 
theoretical.

But meanwhile, back before the pan C++ rewrite introduced with 0.90 over 
five years ago... yes, a backend rewrite was commonly accepted as 
necessary, and sqlite was generally assumed to be what it would use, even 
by Charles in at least one of his few posts, IIRC, from whence it became 
my assumption as well.

I honestly don't know what happened with that, as Charles basically 
disappeared for a couple years (more?), then just as on-list regulars 
including me were beginning to accept the then-apparent reality of pan as 
a dead-app-shipping (with it appearing to be only a matter of time until 
pan's code was stale enough it was no longer worth the support burden of 
continuing to patch to be able to compile against current libraries with 
current gcc, pan has been there twice since I've been around, and come 
back as if from the dead both times!), he appeared with the announcement 
of the pretty well pre-coded rewrite, code-dropped out of nowhere.

So AFAIK, nobody but Charles knows for sure why he didn't choose the 
sqlite route or something similar, or for that matter, the "why" behind a 
number of other choices.  But there are a number of hints based on 
various remarks and behavior in general, thus allowing some reasonable 
guesses.

1) Charles was always quite reluctant to add additional external 
dependencies, and bundling has its own well known problems as well.  It's 
worth noting in this regard that pre-0.11, pan was gnome-1, NOT gtk-1, 
dependent.  It was only with the port to gtk2 that pan lost its gnome 
dependency and only depends on gtk2 and some other misc libs, some 
optionally (as for spell-check).  That's certainly a good thing here, as 
being a kde guy, I'd have for sure found another solution, once I 
switched to gentoo and was building everything from sources via ebuild 
scripts, because while pan had me keeping gtk2 for some time (I now have 
a number of other gtk2 apps I depend on as well, including claws-mail, 
since the akonadification of kmail, and firefox, since it's now clear 
that few even kde devs use konqueror/rekonq by default, or long-running 
bugs would be found intolerable and fixed far sooner, so gtk2 isn't 
departing my system any time soon), but Charles put a lot of work 
(certainly more than I would have, but it was his work, not mine, and I 
realize that we have a number of pan-on-MS regulars here now as a result) 
into ensuring that pan would run on MS platforms as well, and extra 
native-*ix dependencies there are PAINFUL INDEED, so once the port to MS 
was working, he had even MORE incentive not to add additional 
dependencies, and sqlite would obviously have been an additional 
dependency.

(Is sqlite even available on MS?  If not, that could answer the question 
right there, as if Charles regarded GNKSA compliance as the third rail, 
untouchable, MS compatibility certainly came to be a fourth, or so it 
seemed to me.)

2)  Purely based on my own observations, it always seemed to me that 
Charles wasn't particularly comfortable with database style programming.  
That may be why old-pan basically didn't scale at all in the first place, 
while the 0.90+ pan rewrite does appear to at least scale linearly now; 
it simply requires too much memory to do so.

Back well before the rewrite, there were certainly discussions, thus my 
general assumption that pan would end up with an sqlite backend at some 
point.  At that time, it was assumed that the backend rewrite (further 
assumed to incorporate sqlite) would occur at the same time as the 
rewrite for multi-server transparency.  AFAIK, at one point there was in 
fact an experimental version using some sort of sqlite backend, 
importantly, written by someone ELSE, not Charles.  That's why when 0.90+ 
dropped with multi-server transparency fully incorporated but WITHOUT the 
sqlite dependency, I was as surprised as anyone, and I expect a number of 
others with rather more database expertise (not to mention actual coding 
ability) than I have, were rather surprised as well.

Based on the fact that Charles DID reject the sqlite assumption AND the 
experimental if quite preliminary code (tho at that point it was C, not C+
+, and Charles was likely doing the rewrite but not ready to let anyone 
know about it yet, but he still could have used the general idea if he 
had been so inclined), and on various remarks and the way he made them 
over the years, I really /did/ get the feeling that there was some 
hesitance to dive into database theory and code as much as he thought 
he'd have to dive in to do it "right".  That would explain why it just 
never seemed to happen, both before the rewrite, and with it.  
(Afterward, the timing wasn't right for it the first year after a 
rewrite, and after that, he pretty much just lost interest, until his 
eventual request for someone else to take over.)  But it's still simply 
personal supposition.  It could well have been other entirely unrelated 
reasons, that I've no idea of.  But that's just it, this makes sense, I'm 
aware of no opposing evidence, and of nothing else that explains it, so 
despite the lack of hard evidence in support...

But, in the main, people were just happy to have a real live and under 
development pan again, with full multi-server support, and in its a bit 
over a year of active development with nearly weekly version bumps, the 
rewrite fixed quite a lot of bugs both old and new, and reincorporated 
all but one of the major old features.  (That one was the old action 
rules, because Charles considered them too complicated, GUI-wise, as 
implemented in the pre-rewrite pan, and while a much simpler 
implementation was discussed, Charles never implemented it.  It's worth 
noting, however, that HMueller finally implemented it in his repo, see 
the actions tab of preferences if you're running it, but that feature 
isn't in mainline, yet.)

I suspect if you look at the threads where you made the suggestion, if 
indeed it was only a couple years ago (or even up to a bit past five, 
given that as the date the rewrite was introduced), and read them in 
light of the above history, you'll come away with a different view.  
ESPECIALLY if it was real close to two years ago, since at that point, 
pan was just barely coming back alive from another period as a dead-app-
shipping.

I suspect that what you were really being told was that if you wanted it, 
you'd better be prepared to implement it yourself and submit the results 
as a patch or series of patches, since as a practical matter, at that 
point, that was the only way it was ever likely to happen, period.

But, things have changed dramatically over the last couple years.  
There's more people working on pan than I recall ever happening before 
(tho it might have happened pre-2003 or so), long awaited major features 
such as bin-posting that have been on the list for at least a decade, and 
that actions tab to replace rules that has been on the list for half a 
decade, are now implemented at least in hmueller's experimental repo, and 
with Charles, tho we thank him for the solid base that is pan as we have 
it today, now out of the picture, long needed basic discussions such as 
this one (and the GNKSA one of a couple months ago) are being had.

It's still going to be quite a huge job to rewrite pan to an sqlite (or 
whatever, sqlite has just always seemed the simplest assumption, and I've 
*BAD* experiences with for instance MySQL dependencies in recent kde, so 
I *SERIOUSLY* hope pan stays *WELL* away from THAT, but it's an 
option...) backend, but sqlite too has matured, with firefox use, and 
once again, the data handling backend appears to be the biggest blocker 
of further pan long-term advancement, so perhaps now is indeed the time.

I'd suggest that it's going to take someone quite familiar with databases 
(and with sqlite if that is indeed chosen) to spearhead this, but again, 
with pan in git now, and several active pan developers having github pan 
repos, that too is far simpler now that it has ever been before!

The real question tho, is if anyone with the necessary skills is going to 
take the opportunity and run with it.  Are you such a person with the 
requisite skills?  Do you have and are you willing to commit the time and 
energy it'll take?  I honestly don't know.  That's why I'm asking.

It's possible HMueller has the skills too, but he has already 
demonstrated the motivation and commitment, and I don't see developments 
in this particular area, so I'm guessing he's not particularly skilled in 
the database area either, or knowing him, there'd very likely already be 
evidence of developments in that area. =;^0

Which means it's either you, or find someone, or it very likely simply 
won't happen.

Or maybe a collaboration of you and hmueller or some such.  If you've the 
skills and are interested, even if you don't have the time to do it all 
yourself, do at least talk it over with hmueller and khaley.  Perhaps 
they can help and it can get done, even if you don't have the time to do 
it all yourself.

(There's also the jlynch pan repos up on github.  But since he's not 
active in this group/list and it's only that and a few git commit entries 
with his name on them that I have to go on, that's more an intriguing 
lack of info than anything concrete.  Maybe he'd be of help, maybe not.  
At this point I'm mostly just curious what his github pan repo is all 
about, and would love to know more, if anyone has further info on him or 
it.  Maybe one of these days I'll have to do more research on the 
subject, since my dropping broad hints of interest here doesn't seem to 
have provoked responses from anyone with anything more to add about him, 
or that repo.  Then I can either drop further mentions or be more 
informed with them, as appropriate.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]