bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] New wget (1.19.2): Unexpected download behaviour for gzip


From: Jens Schleusener
Subject: Re: [Bug-wget] New wget (1.19.2): Unexpected download behaviour for gzip-compressed tarballs (HTTP-header dependent)
Date: Wed, 1 Nov 2017 20:52:02 +0100 (CET)
User-agent: Alpine 2.20 (LSU 67 2015-01-07)

Hi Tim,

On Mittwoch, 1. November 2017 17:27:58 CET Jens Schleusener wrote:
Hi,

the new "wget" release 1.19.2 has got a new feature:

  "gzip Content-Encoding decompression"

But that feature - at least for my self-conmpiled binary - leads to a
problem if one downloads gzip-compressed tarballs from sites that send for
e.g. an HTTP response header containing lines like

  Content-Type: application/x-tar
  Content-Encoding: gzip

You describe clearly a broken server behavior.


In that cases wget saves a downloaded gzip-compressed tarball now
decompressed (!) what probably breaks a lot of scripts.

Not sure why anyone relies on broken behavior. What if the broken server
configuration becomes fixed ? Then your script breaks as well.

By the way, the server where I detect the problem has already fixed the
problem (no idea how many servers with such behaviour exist).

Additionally the
tarball is saved nevertheless under a filename with the "tar.gz" extension
and not with the "tar" extension.

At least on *nix, the file extension says nothing about the content. That is
why we have the mime-type stated in Content-Type. 'x-tar' clearly is a non-
compressed tar file. Content-Encoding: gzip means that the data has been
compressed for transportation purposes only.

Hmm, so in this case for example the original file should have for convenince the extension .tar (although it says nothing about the content; but just for some stupid users like me that normally deduce the content from the file extension)?

Anyways, whatever we do - it will be broken on some servers and on others not.

Ok, I see the conflict.

Solutions/workarounds may be on affected servers the delivering of an
alternative HTTP header like

  Content-Type: application/x-gzip
  (or Content-Type: application/octet-stream)

or on the client side the use of the new "wget" option

  --compression=none

But maybe it would be better if for such cases wget would revert its
default behaviour to the old one. Or is the described behaviour the
expected one?

Correct server behavior here would be:
Content-Type: application/gzip
together with Content-Encoding: identity, which also may be omitted since it's
the default.

A good explanation is here:
https://superuser.com/questions/901962/what-is-the-correct-mime-type-for-a-tar-gz-file


We can discuss a proposal for a work-around that handles both cases, like
if Content-Encoding == gzip and filename ends with .gz then don't uncompress.

Caveat: this may break our --xattr feature, which saves the mime type with the
file. And then we have to adjust the mime type as well - and that could be
really tedious.

So if I only chanced on that unique (now fixed) broken server and nobody else reports problems with the new behaviour it may be ok.

Regards

Jens



reply via email to

[Prev in Thread] Current Thread [Next in Thread]