pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: very long string support


From: John Darrington
Subject: Re: very long string support
Date: Wed, 3 May 2006 11:50:42 +0800
User-agent: Mutt/1.5.9i

On Tue, May 02, 2006 at 07:51:02PM -0700, Ben Pfaff wrote:
     
     Currently PSPP *always* makes a copy of the data source, whether
     in memory or on disk.[*]  Thus, your 1e8 case .sav file could be a
     problem anyway.

???? Are you saying that in order to read a 1e8 case by 10 variable
.sav file, the machine needs to have a spare 1e9 x 8 byte chunk of
memory?  That's not how I thought it works.
     

     > Notice that even if we can mangle/demangle early, we'll have to
     > change the mangling so that the spaces which are removed are
     > placed at the end of the string, because we cannot change the
     > size of cases.
     
     This doesn't make sense to me.  We can change the format of data
     on input or output as much as we want.


What if the modified case isn't a multiple of 8 bytes long?  Would
that be a problem? And even if it is a n x 8 bytes, wouldn't there be
a problem when an expression like  case_data (c, v->fv) is
encountered?

Eg: If our dictionary contains:

VAR   Length  Case data offset  FV
===   ======  ================  ==
a     8       0                 0
b     2550    8                 1
c     8       2568              321


Then after we've dropped all the spaces it'll look like:

VAR   Length  Case data offset  FV
===   ======  ================  ==
a     8       0                 0
b     2550    8                 1
c     8       2558              321

which means that casedata(c, v->fv) will index into the wrong place in
the case.  Of course, we could make the demangling process update the
fv values, but then it needs to be aware of the dictionary.  And the
dictionary would have two states, one for unmangled cases and one for
mangled, which seems error prone.


     There shouldn't be any need to modify case_data_all() or anything
     outside the system file reader/writer.

It certainly would be desirable to constrain the mangling nonsense to
within the system file specific code if at all possible.  If you think
it's possible, then I'll give it another go.  And I can run the
problems by you as I encounter them.  In the mean time I've discovered
a small buffer read problem when processing the 7(14) record so I'll
correct that first.

J'
     
     
-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.


Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]