|
From: | Tim Rühsen |
Subject: | Re: Please use gzip/gunzip when fetching webpages |
Date: | Fri, 3 Feb 2023 14:40:57 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 |
More often than not I try recursively downloading a webpage using wget, only to have it download a single `index.html.gz` then stop. Obviously wget can't read gzipped files so it fails to find any links for recursive downloading... I ended up using a wget fork[1] that was last updated 10 years ago and it works fine, however I find it odd that such a basic feature never made it into mainline wget. Please add a feature for automatically detecting and uncompressing gzipped webpages before crawling them.
Sorry about your experience. This feature have been added years back:--compression=TYPE choose compression, one of auto, gzip and none. (default: none)
This feature is off by default, but you can add it to your ~/.wgetrc file to permanently enable it (see `man wget`).
Nonetheless, no server should serve gzip compressed pages when not explicitly asked for via `Accept-Encoding: gzip`.
Regards, Tim
OpenPGP_signature
Description: OpenPGP digital signature
[Prev in Thread] | Current Thread | [Next in Thread] |