pspp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ZSAV format support [ZCOMPRESSED subcommand]


From: Ben Pfaff
Subject: Re: ZSAV format support [ZCOMPRESSED subcommand]
Date: Wed, 2 Oct 2013 21:40:03 -0700
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Oct 02, 2013 at 01:03:27PM -0400, Hugo Alejandro wrote:
> A few days ago I was recruited to work in the analysis of large surveys,
> what caught my attention is the use of the format *. zsav above *.sav.
> 
> Apparently this file format supports higher compression ratio and is more
> efficient with large databases to reduce their size on disk and be faster to
> compress-decompress to create a ZIP file (or other format) with a *.sav file
> .
> 
> This file type is very recent, included in SPSS version 21 and improved in the
> current version 22.

This is very interesting.  Thank you for bringing this to our
attention.

The .zsav file format appears to be the same as .sav format up to the
data portion of the file, except that the "magic" at the beginning of
the file is $FL3 instead of $FL2.

The data portion of the file starts at offset 837 (0x345).  Its
contents, with my speculation about their meaning, is:

00000345  45 03 00 00 00 00 00 00 - Byte offset of this block, 0x345.
0000034d  14 07 00 00 00 00 00 00 - byte offset of the next block, 0x714.
00000355  30 00 00 00 00 00 00 00 - Length of next block's header, 0x30 bytes.

It is followed by 951 (0x3b7) bytes of data compressed with the
"deflate" algorithm.  When inflated, these expand to 1120 (0x460) bytes
that exactly match the data portion of the original physiology.sav,
which starts at offset 729 (0x2d9) in the original file.

The file ends with an additional 48 (0x30) bytes starting at offset 1812
(0x714).  Their contents, with my speculation about their meaning, are:

00000714  9c ff ff ff ff ff ff ff - Value -100, dunno why (compression bias?)
0000071c  00 00 00 00 00 00 00 00 - ?
00000724  00 f0 3f 00 01 00 00 00 - ?
0000072c  45 03 00 00 00 00 00 00 - Starting offset of previous block, 0x345.
00000734  5d 03 00 00 00 00 00 00 - Starting offset of data block, 0x35d.
0000073c  60 04 00 00             - Inflated data size, 0x460 bytes.
00000740  b7 03 00 00             - Compressed data size, 0x3b7 bytes.

>From here, I think that the next step would have to be to look at both
the .sav and .zsav versions of files.  I would be most interested in
larger files (say, 1 MB in size), because I think that it is likely that
some of the mysteries above would be cleared up if there were more
compressed blocks in the file (or perhaps we would find out that there
is only ever a single compressed block).

Thanks,

Ben.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]