[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Automake 1.11.2 released

From: Antonio Diaz Diaz
Subject: Re: Automake 1.11.2 released
Date: Mon, 26 Dec 2011 18:02:41 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.11) Gecko/20050905

Hello Miles,

Miles Bader wrote:
What's the difference between xz and lzip anyway...?

I've never even heard of lzip, but the debian package description makes
it sound very similar to xz...

The main difference between xz and lzip is that xz includes some binary filters for executable code of some processors, but I do not see this as an advantage at all because:

1) The binary filters of xz are useless for compressing source tarballs.

2) New filters have to be added to the format as new processors appear in the market.

3) Those so-called BCJ filters have known problems and it is planned to replace them in a future version of xz. The xz format is not the most stable alternative for long term archiving. See this quote from the xz man page:
       These BCJ filters have known problems related to the compression

       ·  Some types of files containing executable code  (e.g.  object
          files,  static  libraries, and Linux kernel modules) have the
          addresses in the  instructions  filled  with  filler  values.
          These BCJ filters will still do the address conversion, which
          will make the compression worse with these files.

       ·  Applying a BCJ filter on an archive containing multiple simi-
          lar executables can make the compression ratio worse than not
          using a BCJ filter.  This is because the BCJ  filter  doesn't
          detect  the  boundaries  of the executable files, and doesn't
          reset the address conversion counter for each executable.

       Both of the above problems will be fixed in the future in a  new
       filter.   The  old  BCJ filters will still be useful in embedded
       systems, because the decoder of the new filter  will  be  bigger
       and use more memory.

4) The xz format is supposed to be extensible, but it will not be extended with new compression algorithms, just as gzip wasn't. It makes no sense to combine two or more different compression algorithms, probably with different command line options, in a big executable.

5) The xz format is fragmented. See this quote also from the xz man page:
  Embedded .xz decompressor implementations like XZ Embedded don't
  necessarily support files created with integrity check types other
  than none and crc32. Since the default is --check=crc64, you must use
  --check=none or --check=crc32 when creating files for embedded

  Outside embedded systems, all .xz format decompressors support all the
  check types, or at least are able to decompress the file without
  verifying the integrity check if the particular check is not

  XZ Embedded supports BCJ filters, but only with the default start

OTOH, the lzip family of programs has some genuine advantages over xz:

1) Lzip is copylefted. This should be important for us in GNU.

2) Lunzip is a decompressor for lzip files much smaller than xzminidec (the xz-embedded "small" xz decompressor), and can decompress any lzip file, while xzminidec can only decompress specially crafted xz files.
   lunzip     (31kB)
   lzip       (89kB)
   xzminidec (171kB)
BTW, some programs of the lzip family, like lunzip, are written in C for better portability to embedded and mobile systems.

3) The dictionary size encoded by lzip is more fine-grained than that of xz, saving memory when decompressing.

4) Lziprecover can recover corrupt lzip files with an efficacy never seen before in a gzip-like compressed format. And it can recover files produced by any of the compressors in the lzip family, as all of them are compatible. No such tool exists for xz, and given the complexity and extensibility of the xz format, I think an effective recovery tool for xz can't be written.

5) The lzip family includes plzip, a massively parallel (multi-threaded) compressor.

6) There exist three related but independent compressor implementations producing files in lzip format (lzip, clzip and minilzip/lzlib) which are verified to produce bit-identical output, much like 3-way redundancy in mission-critical software. AFAIK, xz implementations are not tested to this level.

7) Using xz for software distribution may not be be much of a problem, the format of compressed tarballs can be changed overnight, but for long-term archiving, the simpler the format the more probable is to recover the data decades after.

Best regards,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]