[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #46611] log errors with --trust-server-names
From: |
Tim Ruehsen |
Subject: |
[Bug-wget] [bug #46611] log errors with --trust-server-names |
Date: |
Wed, 16 Mar 2016 11:20:13 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0 |
Follow-up Comment #6, bug #46611 (project wget):
The reason why the wrong '.html' extension is sometimes mentioned and
sometimes now is that there are different servers that we become redirected
to.
One of the servers answers with
HTTP/1.1 304 Not Modified
Date: Wed, 16 Mar 2016 10:55:31 GMT
Accept-Ranges: bytes
ETag: "1444344092"
Content-Type: application/octet-stream
X-HW: 1458125731.dop012.fr7.t,1458125731.cds062.fr7.c
Content-Disposition: attachment; filename="mbam-setup-2.2.0.1024.exe"
The second server answers (here we see the .html later in the logs):
HTTP/1.1 304 Not Modified
Accept-Ranges: bytes
Cache-Control: max-age=86400, public
Date: Wed, 16 Mar 2016 10:37:45 GMT
Etag: "482eb1-15d8fd8-5219f871809fa"
Expires: Thu, 17 Mar 2016 10:37:45 GMT
Last-Modified: Thu, 08 Oct 2015 22:38:53 GMT
Server: ECAcc (fcn/9FA9)
X-Cache: HIT
The Content-Type: header field is missing, which leaves Wget with the default
for HTTP. And I guess Wget's default Content-Type is text/html.
RFC 2616 says:
Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type
header field defining the media type of that body. If and only if the media
type is not given by a Content-Type field, the recipient MAY attempt to guess
the media type via inspection of its content and/or the name extension(s) of
the URI used to identify the resource. If the media type remains unknown, the
recipient SHOULD treat it as type "application/octet-stream".
The updated RFC 7231 3.1.1.5. says:
A sender that generates a message containing a payload body SHOULD
generate a Content-Type header field in that message unless the
intended media type of the enclosed representation is unknown to the
sender. If a Content-Type header field is not present, the recipient
MAY either assume a media type of "application/octet-stream"
([RFC2046], Section 4.5.1) or examine the data to determine its type.
In practice, resource owners do not always properly configure their
origin server to provide the correct Content-Type for a given
representation, with the result that some clients will examine a
payload's content and override the specified type. Clients that do
so risk drawing incorrect conclusions, which might expose additional
security risks (e.g., "privilege escalation"). Furthermore, it is
impossible to determine the sender's intent by examining the data
format: many data formats match multiple media types that differ only
in processing semantics. Implementers are encouraged to provide a
means of disabling such "content sniffing" when it is used.
Before we are going to change Wget's default content-type to
application/octet-stream, I would like to hear some voices. There might be a
good reason for the current behavior of Wget.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?46611>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/