bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [PATCH] Patch to change behavior with redirects under --r


From: Tim Rühsen
Subject: Re: [Bug-wget] [PATCH] Patch to change behavior with redirects under --recurse.
Date: Fri, 07 Oct 2016 22:28:31 +0200
User-agent: KMail/5.2.3 (Linux/4.7.0-1-amd64; KDE/5.26.0; x86_64; ; )

On Freitag, 7. Oktober 2016 15:40:55 CEST Dale R. Worley wrote:
> Tim Ruehsen <address@hidden> writes:
> > the changes in recur.c are not acceptable. They circumvent too many checks
> > like host-spanning, excludes and even --https-only.
> 
> I suppose it depends on what you consider the semantics to be.
> Generally, I look at it if I've specified to download http://x/y/z and
> http://x/y/z redirects to http://a/b/c, if http://x/y/z passes the tests
> I've specified, then the page should be downloaded; the fact that it's
> redirected to http://a/b/c is incidental.  Most checks *should* be
> circumvented.
> 
> I guess I'd make exceptions for --https-only, which is presumably
> placing a requirement on *how* the pages should be fetched, and probably
> the robots check, as that's a policy statement by the server.

If you become redirected to another host/domain, it is wget policy not to do 
so except the user explicitly states it (--span-host or --domains).

Your case is a redirection within the same domain - which my patch considers 
to be ok (even if that redirection contains an explicitly unwanted path/
component). Even that might be dangerous as a default behavior- that is why I 
want to see some more opinions.

We could add another cli option for fine-tuning here.

Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]