pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of v


From: Ron Johnson
Subject: Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Date: Sat, 04 Jul 2009 21:43:18 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.19) Gecko/20090103 Thunderbird/2.0.0.19 Mnenhy/0.7.6.666

On 2009-07-04 21:23, Steven D'Aprano wrote:
On Sun, 5 Jul 2009 05:23:20 am Ron Johnson wrote:
On 2009-07-04 13:57, Matej Cepl wrote:
Ron Johnson, Fri, 03 Jul 2009 21:56:36 -0500:
Also (and maybe because I'm a DBA), this problem just *screams*
for SQLite and a database in the "First Normal Form".
After reading http://www.jwz.org/doc/mailsum.html and having still
alive experience with Evolution,
Corrupt that mbox file and *poof*, there goes years of email.  I
stopped using it years ago as anything but a bzipped archive format.

Yes and no ... you've still got the emails, in text format, so I suppose

Not if the write fails in mid-stream. Remember the recent kerfuffle regarding KDE assuming that the way ext3 works is hows every file system works, and thus losing config files on ext4 partitions?

you could write a recovery utility, if one doesn't already exist.
But yes, I agree, maildir is better than mbox because you're likely to lose no more than one message in the event of corruption. But keep in mind that when Netscape 2 came out, mbox really was the standard -- these days I'd say only old dinosaurs use mbox.

90+% of the people using Tbird still use mbox...

And at least mbox is a text format, and you have one file per mail box, and not one giant undocumented binary file for all mail boxes like Exchange uses.

*shudders*

It's the undocumented part that disturbs me.

I'm also sure that "they" really screwed the pooch when designing the PST file format. The Outlook XP format, though, does seem to perform better than the older version.

SQLite is "just" the obvious choice.  What happened to c-trieve, or
any of the other b+tree libraries?

I think the point is that *any* database is (1) overkill for the requirements and (2) likely to lead to performance and corruption problems.

Maybe I'm just an edge case...  See attached file.

--
Scooty Puff, Sr
The Doom-Bringer
$ pydir --width=15 .pan2 > pan2.dir.txt

--min-size =  0
DIR                 0             .pan2
file             239 07-02 20:35 .pan2/Score
file           6,797 07-02 12:56 .pan2/accels.txt
file           2,181 07-04 13:07 .pan2/group-preferences.xml
file       4,992,242 07-01 18:02 .pan2/newsgroups.dsc
file           1,449 07-04 13:07 .pan2/newsgroups.xov
file         188,829 07-01 18:02 .pan2/newsgroups.ynm
file      17,468,229 07-04 13:07 .pan2/newsrc-1
file              89 07-02 12:56 .pan2/posting.xml
file           4,913 07-04 13:07 .pan2/preferences.xml
file             406 07-02 12:57 .pan2/servers.xml
file     368,903,934 07-04 21:38 .pan2/tasks.nzb
file     319,289,177 07-04 21:39 .pan2/tasks.nzb.tmp
BYTES    710,858,485
FILES             12

DIR                0 07-04 21:38 .pan2/article-cache
file         257,118 07-04 15:42 .pan2/article-cache/address@hidden
file         257,156 07-04 15:42 .pan2/article-cache/address@hidden
file         257,025 07-04 15:42 .pan2/article-cache/address@hidden
file         257,265 07-04 15:42 .pan2/article-cache/address@hidden
file         257,065 07-04 15:42 .pan2/article-cache/address@hidden
file         257,130 07-04 15:42 .pan2/article-cache/address@hidden
file         257,210 07-04 15:42 .pan2/article-cache/address@hidden
file         257,241 07-04 15:47 .pan2/article-cache/address@hidden
file         256,906 07-04 15:42 .pan2/article-cache/address@hidden
file         256,975 07-04 15:42 .pan2/article-cache/address@hidden
file         257,167 07-04 15:42 .pan2/article-cache/address@hidden
file         257,177 07-04 15:42 .pan2/article-cache/address@hidden
file         257,290 07-04 15:42 .pan2/article-cache/address@hidden
file         256,757 07-04 15:42 .pan2/article-cache/address@hidden
file         257,270 07-04 15:42 .pan2/article-cache/address@hidden
file         257,164 07-04 15:42 .pan2/article-cache/address@hidden
file         257,084 07-04 15:42 .pan2/article-cache/address@hidden
file         256,967 07-04 15:42 .pan2/article-cache/address@hidden
file         257,119 07-04 15:42 .pan2/article-cache/address@hidden
file         257,227 07-04 15:42 .pan2/article-cache/address@hidden
file         257,361 07-04 15:42 .pan2/article-cache/address@hidden
file         257,006 07-04 15:42 .pan2/article-cache/address@hidden
file         256,642 07-04 15:47 .pan2/article-cache/address@hidden
file         257,143 07-04 15:42 .pan2/article-cache/address@hidden
file         257,078 07-04 15:42 .pan2/article-cache/address@hidden
file         257,151 07-04 15:42 .pan2/article-cache/address@hidden
file         257,149 07-04 15:42 .pan2/article-cache/address@hidden
file         257,045 07-04 15:42 .pan2/article-cache/address@hidden
file         257,071 07-04 15:42 .pan2/article-cache/address@hidden
file         257,198 07-04 15:42 .pan2/article-cache/address@hidden
file         257,181 07-04 15:42 .pan2/article-cache/address@hidden
file         257,159 07-04 15:42 .pan2/article-cache/address@hidden
file         257,074 07-04 15:42 .pan2/article-cache/address@hidden
file         257,105 07-04 15:42 .pan2/article-cache/address@hidden
file         256,717 07-04 15:47 .pan2/article-cache/address@hidden
file         257,185 07-04 15:42 .pan2/article-cache/address@hidden
file         256,806 07-04 15:42 .pan2/article-cache/address@hidden
file         257,228 07-04 15:42 .pan2/article-cache/address@hidden
file         257,280 07-04 15:42 .pan2/article-cache/address@hidden
file         257,018 07-04 15:42 .pan2/article-cache/address@hidden
file         257,030 07-04 15:42 .pan2/article-cache/address@hidden
file         257,127 07-04 15:42 .pan2/article-cache/address@hidden
file         257,053 07-04 15:42 .pan2/article-cache/address@hidden
file         257,187 07-04 15:42 .pan2/article-cache/address@hidden
file         256,697 07-04 15:47 .pan2/article-cache/address@hidden
file         256,883 07-04 15:42 .pan2/article-cache/address@hidden
file         257,010 07-04 15:42 .pan2/article-cache/address@hidden
file         257,181 07-04 15:42 .pan2/article-cache/address@hidden
file         257,135 07-04 15:42 .pan2/article-cache/address@hidden
file         257,219 07-04 15:42 .pan2/article-cache/address@hidden
file         257,183 07-04 15:42 .pan2/article-cache/address@hidden
file         257,152 07-04 15:42 .pan2/article-cache/address@hidden
file         257,305 07-04 15:42 .pan2/article-cache/address@hidden
file         256,802 07-04 15:47 .pan2/article-cache/address@hidden
file         256,988 07-04 15:42 .pan2/article-cache/address@hidden
file         256,935 07-04 15:42 .pan2/article-cache/address@hidden
file         257,014 07-04 15:42 .pan2/article-cache/address@hidden
file         257,290 07-04 15:42 .pan2/article-cache/address@hidden
file         256,876 07-04 15:42 .pan2/article-cache/address@hidden
file         257,223 07-04 15:42 .pan2/article-cache/address@hidden
file         257,235 07-04 15:42 .pan2/article-cache/address@hidden
file         256,976 07-04 21:38 .pan2/article-cache/address@hidden
file         256,891 07-04 21:38 .pan2/article-cache/address@hidden
file         256,990 07-04 21:38 .pan2/article-cache/address@hidden
file         256,968 07-04 21:38 .pan2/article-cache/address@hidden
file         256,977 07-04 21:38 .pan2/article-cache/address@hidden
file         256,844 07-04 21:38 .pan2/article-cache/address@hidden
file         257,041 07-04 21:38 .pan2/article-cache/address@hidden
file         256,947 07-04 21:38 .pan2/article-cache/address@hidden
file         257,051 07-04 21:38 .pan2/article-cache/address@hidden
file         257,036 07-04 21:38 .pan2/article-cache/address@hidden
file         256,973 07-04 21:38 .pan2/article-cache/address@hidden
file         256,940 07-04 21:38 .pan2/article-cache/address@hidden
file         257,065 07-04 21:38 .pan2/article-cache/address@hidden
file         257,052 07-04 21:38 .pan2/article-cache/address@hidden
file         256,962 07-04 21:38 .pan2/article-cache/address@hidden
file         256,981 07-04 21:38 .pan2/article-cache/address@hidden
file         256,994 07-04 21:38 .pan2/article-cache/address@hidden
file         257,207 07-04 15:42 .pan2/article-cache/address@hidden
file         257,278 07-04 15:42 .pan2/article-cache/address@hidden
file         257,122 07-04 15:42 .pan2/article-cache/address@hidden
file         257,278 07-04 15:42 .pan2/article-cache/address@hidden
file         256,743 07-04 15:47 .pan2/article-cache/address@hidden
file         256,859 07-04 15:42 .pan2/article-cache/address@hidden
file         257,411 07-04 15:42 .pan2/article-cache/address@hidden
file         256,878 07-04 15:42 .pan2/article-cache/address@hidden
file         256,730 07-04 15:42 .pan2/article-cache/address@hidden
file         256,830 07-04 15:42 .pan2/article-cache/address@hidden
file         257,123 07-04 15:42 .pan2/article-cache/address@hidden
file         257,229 07-04 15:42 .pan2/article-cache/address@hidden
file         257,173 07-04 15:42 .pan2/article-cache/address@hidden
file         257,247 07-04 15:42 .pan2/article-cache/address@hidden
file         257,100 07-04 15:42 .pan2/article-cache/address@hidden
file         257,188 07-04 15:42 .pan2/article-cache/address@hidden
file         257,172 07-04 15:42 .pan2/article-cache/address@hidden
file         257,194 07-04 15:42 .pan2/article-cache/address@hidden
file         256,604 07-04 15:42 .pan2/article-cache/address@hidden
file         257,150 07-04 15:42 .pan2/article-cache/address@hidden
file         257,221 07-04 15:42 .pan2/article-cache/address@hidden
file         257,026 07-04 15:42 .pan2/article-cache/address@hidden
file         257,153 07-04 15:42 .pan2/article-cache/address@hidden
file         257,345 07-04 15:42 .pan2/article-cache/address@hidden
file         257,037 07-04 15:42 .pan2/article-cache/address@hidden
file         256,988 07-04 15:42 .pan2/article-cache/address@hidden
file         257,189 07-04 15:42 .pan2/article-cache/address@hidden
file         256,613 07-04 15:42 .pan2/article-cache/address@hidden
file         256,868 07-04 15:42 .pan2/article-cache/address@hidden
file         257,097 07-04 15:42 .pan2/article-cache/address@hidden
file         257,256 07-04 15:42 .pan2/article-cache/address@hidden
file         257,184 07-04 15:42 .pan2/article-cache/address@hidden
file         256,722 07-04 15:42 .pan2/article-cache/address@hidden
file         257,019 07-04 15:42 .pan2/article-cache/address@hidden
file         257,406 07-04 15:42 .pan2/article-cache/address@hidden
file         256,951 07-04 15:42 .pan2/article-cache/address@hidden
file         257,025 07-04 15:42 .pan2/article-cache/address@hidden
file         257,099 07-04 15:42 .pan2/article-cache/address@hidden
file         257,329 07-04 15:42 .pan2/article-cache/address@hidden
file         257,364 07-04 15:42 .pan2/article-cache/address@hidden
file         256,853 07-04 15:47 .pan2/article-cache/address@hidden
file         257,118 07-04 15:42 .pan2/article-cache/address@hidden
file         256,805 07-04 15:42 .pan2/article-cache/address@hidden
file         257,045 07-04 15:42 .pan2/article-cache/address@hidden
file         257,177 07-04 15:42 .pan2/article-cache/address@hidden
file         257,243 07-04 15:42 .pan2/article-cache/address@hidden
file         257,336 07-04 15:42 .pan2/article-cache/address@hidden
file         257,009 07-04 15:42 .pan2/article-cache/address@hidden
file         256,879 07-04 15:47 .pan2/article-cache/address@hidden
file         256,808 07-04 15:42 .pan2/article-cache/address@hidden
file         257,024 07-04 15:42 .pan2/article-cache/address@hidden
file         257,372 07-04 15:42 .pan2/article-cache/address@hidden
file         257,164 07-04 15:42 .pan2/article-cache/address@hidden
file         257,034 07-04 15:42 .pan2/article-cache/address@hidden
file         257,143 07-04 15:42 .pan2/article-cache/address@hidden
file         257,339 07-04 15:42 .pan2/article-cache/address@hidden
BYTES     34,447,880
FILES            134

DIR                0 07-04 13:07 .pan2/groups
file     169,283,585 07-03 20:44 .pan2/groups/alt.binaries.classic.tv.shows
file     398,215,761 07-01 20:22 .pan2/groups/alt.binaries.dvd.classic.movies
file     123,196,908 07-01 23:40 .pan2/groups/alt.binaries.dvd.classics
file       1,977,973 07-04 13:07 .pan2/groups/alt.binaries.dvd.documentaries
file       9,510,860 07-01 23:26 .pan2/groups/alt.binaries.dvd.english
file   3,658,599,777 07-02 03:37 .pan2/groups/alt.binaries.dvdr
file     242,645,704 07-01 22:18 .pan2/groups/alt.binaries.multimedia.cartoons
file       4,556,937 07-02 03:47 
.pan2/groups/alt.binaries.multimedia.cartoons.vintage
file      44,425,084 07-01 21:58 
.pan2/groups/alt.binaries.multimedia.classic-films
file           1,250 07-01 23:10 .pan2/groups/alt.binaries.multimedia.firefly
file           1,250 07-01 23:09 .pan2/groups/alt.binaries.multimedia.futurama
file     109,996,100 07-01 21:54 .pan2/groups/alt.binaries.multimedia.sitcoms
file      80,158,349 07-02 04:43 .pan2/groups/alt.binaries.multimedia.startrek
file           1,250 07-01 23:07 .pan2/groups/alt.binaries.multimedia.the.tick
file       1,107,100 07-02 03:57 .pan2/groups/alt.binaries.multimedia.tv
file         580,397 07-01 23:25 
.pan2/groups/alt.binaries.multimedia.vintage-animation
file     444,843,090 07-01 21:21 
.pan2/groups/alt.binaries.multimedia.vintage-film
file     287,828,179 07-01 20:47 .pan2/groups/alt.binaries.multimedia.vintage-tv
file           1,514 07-01 20:21 
.pan2/groups/alt.binaries.multimedia.vintage.animation
file       7,699,291 07-01 20:13 .pan2/groups/alt.binaries.multimedia.vintage.tv
file           1,250 07-02 04:45 .pan2/groups/alt.binaries.tv.futurama
BYTES  5,584,631,609
FILES             21


Total bytes  6,329,937,974
Total files            170

reply via email to

[Prev in Thread] Current Thread [Next in Thread]