bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [Wget v1.14 Patch] When using -nc argument


From: Ángel González
Subject: Re: [Bug-wget] [Wget v1.14 Patch] When using -nc argument
Date: Mon, 18 Mar 2013 15:47:05 +0100
User-agent: Thunderbird

El 17/03/13 23:50, Juan Miguel Taboada Godoy escribió:
> Hello:
>
> This patch prevent wget from stopping when -nc argument is in use and
> file in the disk (from a previous download) has a name which doesn't
> finish with htm or html.
>
> I detected this bug when downloading a website with this URL:
> http://www.abc.com/dirdir/cgi-bin/Search.php?lng=EN&search=Query_Search_List
Wrong url?

> This was saved in the disk at:
> dirdir/cgi-bin/
> as a file named:
> Search.php?lng=EN&search=Query_Search_List
>
> When I was using -nc argument, wget couldn't detect that this file could
> have links inside.
>
> Because I believe wget should check most of the files as text/html just
> in case there are some links inside to visit, I repaired this problem
> not checking for htm or html suffix inside the name of the file.
>
> Sincerely,
Usually the problem is the opposite, with wget checking too many files.
A problem this could cause is when you have big binary files (eg. 16GB)
and wget dies when trying to parse them.

Perhaps we could add a --expect-html-everywhere option :/

> La legislación española ampara el secreto de las comunicaciones. Este
> mensaje se dirige exclusivamente a su destinatario y puede  contener
> información privilegiada o CONFIDENCIAL. (...)
You are sending your patch to a GPL program to a publicly archived
mailing list...

Regards




reply via email to

[Prev in Thread] Current Thread [Next in Thread]