bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#29089: Truncated size of big file


From: Mark Adler
Subject: bug#29089: Truncated size of big file
Date: Tue, 31 Oct 2017 11:20:29 -0700

Alex,

This is inherent in the gzip format, and is not really a bug in gzip. (Though 
gzip could notice the problem and not display a large negative compression 
ratio.)

The gzip format stores the uncompressed length at the end using four bytes, 
which can only represent up to 2^32-1. So what you are seeing is the low 32 
bits of 18962535424, which is in fact 1782666240. When gzip uses that truncated 
value to compute a compression ratio, it gets a nonsensical result.

Unfortunately the only way to get the real uncompressed length and compute a 
real ratio is to decompress the entire file. (In fact, pigz will do this with 
"pigz -lt", which tests the entire file without storing the result, and reports 
the correct uncompressed size and compression ratio. "pigz -l" will do the same 
bad thing that "gzip -l" does on > 4 GB uncompressed sizes, though it will 
report “unk” for questionable ratios, i.e. expansions of the data beyond what 
would be expected for incompressible data.)

Mark


> On Oct 31, 2017, at 10:59 AM, Alex Peshkoff <address@hidden> wrote:
> 
> Before decompressing a copy of database I've decided to take a look at it's 
> size:
> 
> localhost stg # gunzip -l SWHTOROLT_20171019.GBK.gz
>          compressed        uncompressed  ratio uncompressed_name
>          3645968323          1782666240 -104.5% SWHTOROLT_20171019.GBK
> 
> uncompressed is reported as 1.7Gb which is definitely something unreal like 
> -104.5 compress ratio
> 
> Actual size after unzip is:
> 
> localhost stg # gunzip SWHTOROLT_20171019.GBK.gz
> localhost stg # ls -l SWHTOROLT_20171019.GBK
> -rw-r--r-- 1 root root 18962535424 Oct 19 15:59 SWHTOROLT_20171019.GBK
> 
> Lickily I've had enough disk space - but let me not attach problematic 
> archive to email, I suppose it's easier to reproduce this locally ;)
> 
> Alex.
> 
> 
> 
> 
> 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]