pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of v


From: Steven D'Aprano
Subject: Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Date: Sun, 5 Jul 2009 12:56:27 +1000
User-agent: KMail/1.9.9

On Sun, 5 Jul 2009 11:39:48 am Ron Johnson wrote:
> On 2009-07-04 17:21, CSV4ME2 wrote:
> > On Saturday 04 July 2009, Ron Johnson wrote:
> >> On 2009-07-04 13:57, Matej Cepl wrote:
>
> [snip]
>
> >>>                             I don't trust any email client which
> >>> saves anything into SQLite ;-)
> >>
> >> SQLite is "just" the obvious choice.  What happened to c-trieve,
> >> or any of the other b+tree libraries?
> >
> > No it isn't:
> > - nothing beats processing dedicated in-core data structures wrt to
> > speed
>
> Your CompSci professor wants to back, to fail you in Data Structures
> class...

Ha! You fail! *wink* 

CSV4ME2 didn't say anything about making "a linear search thru [sic] a 
large in-memory array". Read what he said more carefully:

"nothing beats processing dedicated in-core data structures wrt to 
speed".

No mention of linear searching. Hash tables get O(1) searches, binary 
trees get O(log N), as do binary searches through an array. And if 
they're in memory, you don't have to wait for disk IO which is two 
orders of magnitude slower than memory IO.


> A linear search thru a large in-memory array is *much* slower than
> an indexed search of an ODS (on-disk structure, like a b-tree or an
> inverted list).  Especially if the OS has buffered that ODS into
> core.

If the entire ODS can fit in memory, and you don't need persistence, 
then why bother writing it to disk?

Of course, if you do need persistence, that's a good reason. But if you 
don't need ACID compliance, why pay the overhead of ACID compliance? 
Just serialise the data structure to disk as needed, keeping the old 
one behind as backup.



-- 
Steven D'Aprano




reply via email to

[Prev in Thread] Current Thread [Next in Thread]