[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] recursive no parent slash bug
From: |
Micah Cowan |
Subject: |
Re: [Bug-wget] recursive no parent slash bug |
Date: |
Tue, 23 Jun 2009 11:43:13 -0700 |
User-agent: |
Thunderbird 2.0.0.21 (X11/20090409) |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Mike Turchenkov wrote:
> Hello, thanks for great program!
>
> I have noticed a little bug with GNU Wget 1.11.1
>
> Suppose
> GNU Wget 1.11.1
> http://site.com/content/AAA/aaa.files
> http://site.com/content/BBB/bbb.files
>
> and I want to get only aaa.files in current dir.
>
> When I input in
>
> wget -r --level=5 -S -nH --cut-dirs=2 -N --http-user=***
> --http-passwd=*** -np -w 1 -o logfile -e robots=off
> "http://site.com/content/AAA"
>
> It gets not only aaa.files but bbb.files too.
>
> But with "/" in the end of address
>
> wget -r --level=5 -S -nH --cut-dirs=2 -N --http-user=***
> --http-passwd=*** -np -w 1 -o logfile -e robots=off
> "http://site.com/content/AAA/"
>
> it works well.
Not a bug. HTTP has no way to differentiate between the idea of a
"directory" and a "file", and no such difference actually exists. It's a
user perspective that Wget tries to adhere to; but the only reliable
means Wget has to differentiate between a file and a directory is by
means of the trailing slash, otherwise there is absolutely no way to
know whether the final AAA is a file or a "directory".
(There is a heuristic that would be possible on _some_ servers, which we
may implement at some point, which is that if the server automatically
redirects "AAA" to "AAA/", we should assume that there should have been
a trailing slash to begin with, and proceed accordingly.)
- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
Maintainer of GNU Wget and GNU Teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkpBIkEACgkQ7M8hyUobTrHHjgCfQsKQNUo1SxrYdUiLDj3at1Xi
ZZIAn1EdXtHgoJuTDXCSR9HnN9sVDbgM
=RJ90
-----END PGP SIGNATURE-----