[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Unexpected wget -N behaviour for 1.17 onwards?
From: |
Darshit Shah |
Subject: |
Re: [Bug-wget] Unexpected wget -N behaviour for 1.17 onwards? |
Date: |
Tue, 12 Feb 2019 00:21:58 +0100 |
User-agent: |
NeoMutt/20180716 |
* Tim Rühsen <address@hidden> [190211 13:45]:
> You are right, --if-modified-since changes -N behavior in case a file is
> incomplete. --if-modified-since can't easily be fixed since the 304
> response does not include file size information.
>
> As you suggest, we should disable this option by default or at least
> discuss the options we have.
That's correct. While, the lack of a Content-Length header on a 304 response
causes problems, we can't rely on it to exist even for normal 200 / 206
response.
Let me try to aggregate some of the possible options (I'm not saying any of
these are particularly a good idea):
1. Write file to a tmpfile and on successful download, move it to the real
location.
This option has multiple problems. Firstly, people don't expect Wget to
write to a tmp file. This can be problematic, especially when people try to
play streaming data without a -O. But for the purposes of dealing with -N
and --if-modified-since, this is the best option.
2. Issue a utime() call after every write() in order to set the mtime again to
something older than the one reported by the server.
In this, we would need to issue a utime() after each call to write() in order
to reset its mtime to an earlier time. After the file is fully downloaded,
set the mtime to the actual one as provided by the server. This introduces
an issue where Wget is issuing too many system calls. And with Wget2, it
might get really bad due to downloading ~30+ files in parallel. I'm also
unsure of how the kernel handles races between write() and utime() calls. We
don't want to set the mtime of the file and have it overwritten by the
previous write() call. This might be valid option, especially since it is
cross platform. However, the performance impact would need to be evaluated.
3. Only enable If-Modified-Since when xattr is available.
The idea here is simple, on systems where xattr is possible, store either an
old timestamp or a completion flag in the attributes. Use this metadata to
issue a If-Modified-Since header. If xattr is not available or the
attributes are not found, use the HEAD+GET approach.
Are there any other options that I've missed?
> On 2/10/19 2:42 PM, Lawrence Wade wrote:
> > Hi Tim,
> >
> > Okay. Using the OpenSUSE-packaged wget (1.19.5) that comes with Leap 15.0:
> >
> > $ wget -r -N 192.168.2.100:8080
> > ...
> > Reusing existing connection to 192.168.2.100:8080.
> > HTTP request sent, awaiting response... 304 Not Modified
> > File ‘192.168.2.100:8080/OaP6ysTyz6Y.mp
> > 4’ not modified on server. Omitting download.
> >
> > This file is incomplete in my local copy.
> >
> > Trying again as you suggest,
> >
> > $ wget -r -N --no-if-modified-since 192.168.2.100:8080
> > ...
> > --2019-02-10 08:35:14-- http://192.168.2.100:8080/OaP6ysTyz6Y.mp4
> > Reusing existing connection to 192.168.2.100:8080.
> > HTTP request sent, awaiting response... 200 OK
> > Length: 38044195 (36M) [application/octet-stream]
> > The sizes do not match (local 8643456) -- retrieving.
> > --2019-02-10 08:35:14-- http://192.168.2.100:8080/OaP6ysTyz6Y.mp4
> > Reusing existing connection to 192.168.2.100:8080.
> > HTTP request sent, awaiting response... 200 OK
> > Length: 38044195 (36M) [application/octet-stream]
> > Saving to: ‘192.168.2.100:8080/OaP6ysTy
> > z6Y.mp4
> > ...
> >
> > And it appears to work as expected. Won't this change to the behaviour
> > of -N option subtly break a lot of scripts which rely on wget?
> >
> > Thanks so much, Tim. I do have an answer and a workaround though my
> > concerns remain.
> >
> > Lawrence Wade
> > Ottawa, Canada
> >
> > On Sun, Feb 10, 2019 at 2:11 AM Lawrence Wade <address@hidden> wrote:
> >>
> >> Hi Everyone,
> >>
> >> This might be a corroboration of this
> >> http://lists.gnu.org/archive/html/bug-wget/2018-10/msg00049.html
> >> and this
> >> https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1715481
> >>
> >> I use wget to backup my cellphone running Palapa Web Server, and it
> >> has worked well for me for years. Since upgrading to OpenSUSE Leap 15,
> >> I have been having corrupted files.
> >>
> >> My method is
> >> $ wget -r -N 192.168.2.100:8080
> >> and if the connection is interrupted for any reason, the next time I
> >> call wget it would complete any incomplete files. And since Leap 15, I
> >> have been getting gradually corrupted backups. I was tearing my hair
> >> out looking at wgetrc and other things.
> >>
> >> With one long file that I knew was incomplete, I got a Not Modified -
> >> omitting download, even though I knew the file sizes were different
> >> between the server and wget's copy - though the wget man page
> >> explicitly states that if the file sizes do not match, -N will trigger
> >> a download.
> >>
> >> I tried on OpenSUSE 42.3 (wget 1.14) and the incomplete file triggered
> >> a download, even though wgetrc was identical.
> >>
> >> Again, on Leap 15, I compiled 1.20.1 (latest), 1.17.1, and then
> >> finally with 1.16.3 the behaviour went back to what I expected (and I
> >> got my corrupted phone backups fixed).
> >>
> >> Was a bug possibly introduced in 1.17 with the support for
> >> --if-modified-since?
> >>
> >> Version shipping with OpenSUSE Leap 15:
> >> GNU Wget 1.19.5 built on linux-gnu.
> >> +cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
> >> +ntlm +opie +psl +ssl/openssl
> >>
> >> Last version I tried where "wget -r -N" works as expected:
> >> GNU Wget 1.16.3 built on linux-gnu.
> >> +digest +https +ipv6 -iri +large-file +nls +ntlm +opie +psl +ssl/gnutls
> >>
> >> I'm open to the possibility that there may be something else causing
> >> this bug, I have not found many mentions of it, but then again it is
> >> subtle. You get pretty confident when you just let wget do its thing,
> >> so there may be a lot of incomplete files out there... :)
> >>
> >> Thanks so much for your help. I can provide any other info that would
> >> be helpful.
> >>
> >> Lawrence Wade
> >> Ottawa, Canada
> >
>
--
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
signature.asc
Description: PGP signature