[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] [PATCH] Patch to change behavior with redirects under --r
From: |
Tim Rühsen |
Subject: |
Re: [Bug-wget] [PATCH] Patch to change behavior with redirects under --recurse. |
Date: |
Fri, 07 Oct 2016 22:28:31 +0200 |
User-agent: |
KMail/5.2.3 (Linux/4.7.0-1-amd64; KDE/5.26.0; x86_64; ; ) |
On Freitag, 7. Oktober 2016 15:40:55 CEST Dale R. Worley wrote:
> Tim Ruehsen <address@hidden> writes:
> > the changes in recur.c are not acceptable. They circumvent too many checks
> > like host-spanning, excludes and even --https-only.
>
> I suppose it depends on what you consider the semantics to be.
> Generally, I look at it if I've specified to download http://x/y/z and
> http://x/y/z redirects to http://a/b/c, if http://x/y/z passes the tests
> I've specified, then the page should be downloaded; the fact that it's
> redirected to http://a/b/c is incidental. Most checks *should* be
> circumvented.
>
> I guess I'd make exceptions for --https-only, which is presumably
> placing a requirement on *how* the pages should be fetched, and probably
> the robots check, as that's a policy statement by the server.
If you become redirected to another host/domain, it is wget policy not to do
so except the user explicitly states it (--span-host or --domains).
Your case is a redirection within the same domain - which my patch considers
to be ok (even if that redirection contains an explicitly unwanted path/
component). Even that might be dangerous as a default behavior- that is why I
want to see some more opinions.
We could add another cli option for fine-tuning here.
Tim
signature.asc
Description: This is a digitally signed message part.