[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Issue with --content-on-error and --convert-links
From: |
Yousong Zhou |
Subject: |
Re: [Bug-wget] Issue with --content-on-error and --convert-links |
Date: |
Thu, 16 Oct 2014 15:24:48 +0800 |
On 13 October 2014 10:25, Joe Hoyle <address@hidden> wrote:
> Hi All,
>
>
> I’m having issues using "--convert-links” in conjunction with
> "--content-on-error”. Though "--content-on-error” is forcing wget to download
> the pages, the links to that “errored” page is not update in other pages that
> link to it.
>
>
> This seems to be hinted at in the man page:
>
>
> "Because of this, local browsing works reliably: if a linked file was
> downloaded, the link will refer to its local name; if it was not downloaded,
> the link will refer to its full Internet address rather than presenting a
> broken link. The fact that the former links are converted to relative links
> ensures that you can move the downloaded hierarchy to another directory.”
>
>
> However, it would seem in the case of using —content-on-error it should
> ignore this rule and do all the link substation anyhow.
>
>
> If anyone knows if this *should* work then I’d be eager to hear it, or any
> other way I can get any 404 pages downloaded and also linked to in the wget
> mirror.
>
Currently, wget thought pages with 404 status code were not RETROKF
(retrieval was OK) though the 404 page itself was actually downloaded
successfully with `--content-on-error` option enabled. This behaviour
is mostly acceptable I guess. But you can try the attached the patch
for the moment. The other option would be serving the 404 page by
manually setting it up with your web server.
Regards.
yousong
0001-Let-convert-links-work-with-content-on-error.patch
Description: Binary data