[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #53818] Proposal: Check HTML suffix (for TEXTHTML flag)
[Bug-wget] [bug #53818] Proposal: Check HTML suffix (for TEXTHTML flag) also on unchanged files
Thu, 3 May 2018 06:00:54 -0400 (EDT)
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
Summary: Proposal: Check HTML suffix (for TEXTHTML flag) also
on unchanged files
Project: GNU Wget
Submitted by: a4lg
Submitted on: Thu 03 May 2018 07:00:52 PM JST
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Assigned to: None
Discussion Lock: Any
Operating System: GNU/Linux
Reproducibility: Every Time
Fixed Release: None
Planned Release: None
Work Required: None
Patch Included: Yes
If both `-r' (recursive) and `-N' (check timestamp) options are given and the
server returns 304 (Not Modified), the HTML file (already downloaded) is not
considered as a HTML file and links in the HTML file are not followed.
If we want to (periodically) backup some website (all pages are linked from
index.html directly or indirectly) to track some changes while avoiding
unnecessary downloads, we naturally use `-N' option. However, if some "leaf"
pages are changed but index.html is unchanged, we could miss some important
I hate this behavior (`-nc' option mostly works because it guesses HTML file
by its file name suffix but `-N' doesn't) so I decided to propose a small
The attached patch reuses `get_file_flags` (which guesses HTML file by file
name suffix *when -nc (no clobber) option is given*) if the server returns 304
0 This patch slightly changes Wget's behavior.
0 It makes a caveat similar to bug #50935. If solution to bug #50935 is
invented, it can be (and should be) applied to this.
0 I (as author) consider this patch is too small to be copyrighted.
I tested the patch but I'm not sure whether this patch is suitable for
upstream merge. I consider this as _improvement_ but you may consider I
_broke_ the behavior.
Please let me know if you have any feedback about this.
Date: Thu 03 May 2018 07:00:52 PM JST Name:
0001-Check-HTML-suffix-also-on-unchanged-files.patch Size: 2KiB By: a4lg
Reply to this item at:
Message sent via Savannah
- [Bug-wget] [bug #53818] Proposal: Check HTML suffix (for TEXTHTML flag) also on unchanged files,
Tsukasa OI <=