Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of v

pan-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of v

From:	Ron Johnson
Subject:	Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Date:	Sat, 04 Jul 2009 22:50:38 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.19) Gecko/20090103 Thunderbird/2.0.0.19 Mnenhy/0.7.6.666

On 2009-07-04 21:56, Steven D'Aprano wrote:

On Sun, 5 Jul 2009 11:39:48 am Ron Johnson wrote:

On 2009-07-04 17:21, CSV4ME2 wrote:

On Saturday 04 July 2009, Ron Johnson wrote:

On 2009-07-04 13:57, Matej Cepl wrote:

[snip]

                            I don't trust any email client which
saves anything into SQLite ;-)

SQLite is "just" the obvious choice.  What happened to c-trieve,
or any of the other b+tree libraries?

No it isn't:
- nothing beats processing dedicated in-core data structures wrt to
speed

Your CompSci professor wants to back, to fail you in Data Structures
class...

Ha! You fail! *wink*CSV4ME2 didn't say anything about making "a linear search thru [sic] alarge in-memory array". Read what he said more carefully:

"nothing beats processing dedicated in-core data structures wrt tospeed".


I know...

No mention of linear searching. Hash tables get O(1) searches, binarytrees get O(log N), as do binary searches through an array. And ifthey're in memory, you don't have to wait for disk IO which is twoorders of magnitude slower than memory IO.

Maybe it's the pedant in me, but he made no mention of the type ofalgorithm, so to make point, I chose an example demonstrating that*in and of itself*, putting the data structures in memory does not*guarantee* good performance.


"All else being equal", though, yes it does.

A linear search thru a large in-memory array is *much* slower than
an indexed search of an ODS (on-disk structure, like a b-tree or an
inverted list).  Especially if the OS has buffered that ODS into
core.
If the entire ODS can fit in memory, and you don't need persistence,then why bother writing it to disk?

Because you are *never* guaranteed that your data structures willfit in RAM, and that the user will have lots of RAM and a multi-coreCPU.

Using a well-indexed structure means that the app doesn't have tocontinually copy/rename/delete tasks.nzb. Performance will bemaintained because the OS will buffer most of the ODS, so you'llonly be writing back dirty pages instead of serializing the wholetasks.nzb.

THIS FACT IS THE genesis of this whole long thread: currently on mysystem, pan is copy/rename/deleting a 370MB tasks.nzb every 90 seconds.

Of course, if you do need persistence, that's a good reason. But if youdon't need ACID compliance, why pay the overhead of ACID compliance?Just serialise the data structure to disk as needed, keeping the oldone behind as backup.

As you can see from this attachment, Pan is using 2,5GB RAM, 32% ofcore. If I were using a more typical 4GB, 2GB or even 1GB, Panwould be thrashing my system to death.


(Are you people reading this through Pan seeing my attachments?)

--
Scooty Puff, Sr
The Doom-Bringer

top - 22:32:45 up 16 days, 10:23,  1 user,  load average: 1.82, 1.90, 1.93
Tasks: 189 total,   3 running, 186 sleeping,   0 stopped,   0 zombie
Cpu(s): 84.2%us, 14.9%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.5%hi,  0.5%si,  0.0%st
Mem:   8177796k total,  8125156k used,    52640k free,   278360k buffers
Swap:        0k total,        0k used,        0k free,  3974552k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
15760 me        20   0 2610m 2.5g 5864 R   89 31.9   2447:41 pan                
28872 me        20   0  687m 321m  14m S   12  4.0   1086:44 firefox-bin        
27533 root      20   0  423m 269m 7432 S    0  3.4 111:00.45 Xorg               
28060 me        20   0  397m 217m  18m S    1  2.7  18:03.99 icedove-bin        
10929 root      20   0 47648  31m 1260 S    0  0.4   2:55.08 console-kit-dae    
29007 me        20   0 70532  31m 1308 S    0  0.4  51:34.65 hellanzb           
 2475 root      20   0 32200  27m 2376 S    0  0.3   0:04.28 spamd              
 6210 root      20   0 32080  27m 2236 S    0  0.3   0:06.58 spamd              
 5556 me        20   0 55380  26m 4284 S    0  0.3   4:26.18 gqview             
 4506 root      20   0 29568  23m 1108 S    0  0.3   0:41.83 spamd              
30608 me        20   0 39864  22m 1716 S    0  0.3   0:13.15 urxvt              
27649 me        20   0 83516  12m 5976 S    0  0.2  14:45.63 gnome-panel        
27650 me        20   0 82056 9828 4940 S    0  0.1   0:24.91 nautilus           
27696 root      20   0 12504 9140  716 S    0  0.1   0:00.23 SystemToolsBack    
27656 me        20   0 28172 8216 2492 S    0  0.1   0:00.30 system-config-p    
13242 me        20   0 36984 7524  988 S    0  0.1   0:02.01 urxvt              
27689 me        20   0 56048 7500 4460 S    0  0.1   0:11.77 gweather-applet

[Prev in Thread]

Current Thread

[Next in Thread]

[Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?), (continued)

Prev by Date: Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Next by Date: [Pan-users] [Semi-OT] RAR and par2 files
Previous by thread: Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Next by thread: Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Index(es):
- Date
- Thread