bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wget 1 is not preserving server-side modification times via FTP


From: Tim Rühsen
Subject: Re: Wget 1 is not preserving server-side modification times via FTP
Date: Sun, 2 Jun 2024 20:05:16 +0200
User-agent: Mozilla Thunderbird

Hey Thomas,

the implementation of MDTM seems to straight forward. I possibly find some time during the next weekends.

I'll discuss the backwards compatibility issue with Darshit (also wget maintainer).

Regards, Tim

On 6/2/24 17:40, Thomas Orgis wrote:
Am Sun, 2 Jun 2024 13:44:50 +0200
schrieb Tim Rühsen <tim.ruehsen@gmx.de>:

And normally (or often), you don't need the server timestamp for single
file downloads. And if you really do, there is -N.

Well, what 'normal' need is is obviously something one can discuss
endlessly (see https://bugzilla.mozilla.org/show_bug.cgi?id=178506#c7).
I just now realized that -N indeed does get the timestamp, but with
unexpected (but documented) side-effects like overwriting and deleting
any file named '.listing'.

So my issue is that wget refuses to work with -N and -O. My purpose is
that I have the name of the file to fetch and a number of URLs that I
could fetch from. I don't want to guess what filename wget determines
from the URL (there can be random redirects, even, to some name like
error_document.html …).

I guess I'd need to run it in a temporary directory and glob the
hopefully single downloaded file in there. Can we discuss that

        wget -N -O file ftp://example.com/pub/hash/465df6a5db

is a valid/sensible use case where one would want to both control the
output filename and preserve the server timestamp? I don't see
something breaking if we enabled that. To meet my expectations, though,
the temporary creation and removal of .listing would need to be
eliminated.

So, with existing wget installations, I guess the equivalent to

        curl --fail -L -s --remote-time -o "$1" "$2"

is

        dir=$(mktemp -d wget.XXXX)
        test -n "$dir" &&
        test "$dir" != "$1" && # fun corner cases!
        cd "$dir" &&
        wget -q -N "$2" &&
        cd .. &&
        mv "$dir/*" "$1"
        rm -rf "$dir"

With -N working together with -O at least as I expected it, this would
be a bit simpler … but actually, I must admit that it still would be
hacky, as -N implies more than just preserving the timestamp. My
preference still would be to 'fix' the FTP download to just do a ls on
the file (not the whole directory) to get the timestamp, as curl
apparently is doing.

Thinking about that … what is it doing?

PWD
< 257 "/" is the current directory
* Entry path is '/'
CWD pub
* ftp_perform ends with SECONDARY: 0
< 250 CWD command successful
CWD file
< 250 CWD command successful
MDTM file-5.11.tar.gz
< 213 20120221191957
EPSV
TYPE I
< 200 Type set to I
SIZE file-5.11.tar.gz
< 213 610019
RETR file-5.11.tar.gz
< 150 Opening BINARY mode data connection for file-5.11.tar.gz (610019 bytes)

I now see that wget is close to that …

==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/file ... done.
==> SIZE file-5.11.tar.gz ... 610019
==> EPSV ... done.    ==> RETR file-5.11.tar.gz ... done.

It's just missing MDTM, which returns an easy-to-parse timestamp. It
already does SIZE.

Changing this behavior is possibly breaking assumptions made by other
FTP users and scripts. So I really would keep this behavior as is.

I guess it then would mean adding an equivalent option to --remote-time
in curl to avoid any surprises. I still want to point out the
inconsistency with HTTP downloads. Users of Wget should not have to
care if the link is http or ftp. But if you consider the bug
long-standing enough to be expected behaviour, then that is your
decision.

After all, there is a distinction in curl as a simple single-URL
fetcher (?) and wget as a recursive downloader that I'm only using as
the former.

In a few years you can probably say: "Hey ChatGPT, please code me a
recursive downloader that supports all internet protocols that can be
used for file downloading. In Rust please." :)

This would probably be a rather small program that uses some standard
library in Rustland that by chance shares this very bug (IMHO) with
wget;-)

Maybe we can extend the Wget2 plugin system, so that someone is able to
contribute an FTP plugin.

I'm not claiming that I would volunteer to write such a plugin.

Incidentally, I remember that I _did_ dabble on the protocol level with
a simple HTTP client in the past … in the form of maintaining/extending
mpg123's custom code for HTTP streaming. HTTP/1.0 was OK (or in fact
HTTP/0.9 from Shoutcast servers, of which there apparently still are
instances), but I didn't fancy implementing more modern stuff, most
prominently touch TLS even just by loading libraries.

So I even went so far to rather, also for mpg123, just resort to
calling external downloaders:

        http://scm.orgis.org/mpg123/trunk/src/net123_exec.c

I decided that the protocol details are _your_ problem, then;-) But
granted, FTP is not so usual for mp3 streaming, but I bet that music
collections used to be available via that protocol at least at certain
gatherings 25 years ago … and now mpg123 can directly play that thanks
to wget supporting FTP downloads!

(And server timestamps are of no relevance in that application.)

Alternatively, add -N to the wget command line.

I'll ponder that … but with the extra caution about file naming, having
to avoid -O and rogue writes to .listing files, I presume. So far curl
does the job without the extra hassle.

Thanks for the patience.


Alrighty then,

Thomas

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]