Re: Recursive downloading of pages through the "action" attributes of th

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Recursive downloading of pages through the "action" attributes of th

From:	BERBAR Florian
Subject:	Re: Recursive downloading of pages through the "action" attributes of the following "form" tags
Date:	Mon, 15 May 2023 01:19:47 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

I reproduce this issue with the lastest version (1.21.4) with thefollowing pages :


form.html:<html>
form.html:    <body>
form.html:        <form action="./post.html" >
form.html:            <input name="ff" type='text' />
form.html:            <input name='tt' type='submit' />
form.html:        </form>
form.html:    </body>
form.html:</html>

post.html:<html>
post.html:    <body>
post.html:        <a href="./link.html">link<a/>
post.html:    </body>
post.html:</html>

link.html:<html>
link.html:    <body>
link.html:        <a href="./form.html">form<a/>
link.html:    </body>
link.html:</html>

A basic recusive command only downloads the form.html page when Iexpected to download all 3 pages.


wget-1.21.4$ ./src/wget -r http://127.0.0.1/form.html
--2023-05-15 01:08:55-- http://127.0.0.1/form.html
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 145 [text/html]
Saving to: '127.0.0.1/form.html'

127.0.0.1/form.html 100%[===================>]     145 --.-KB/s    in 0s

2023-05-15 01:08:55 (18.6 MB/s) - '127.0.0.1/form.html' saved [145/145]

FINISHED --2023-05-15 01:08:55--

Regards,

Florian

On 4/17/23 21:22, BERBAR Florian wrote:

Hi folk,
I have question about recursive downloading of webpages. Trying todownload all pages from a website using recursing option (--recursive)on wget 1.21, the webpages processing seems to don't follow form"action" attributs of "form" tags.
- Does it be the expecting behavior?
- Is there a combination of options to download all pages of a websitewith the attribut "action"?
Exemple with 3 HTML pages :
- Page 1 - form.html : HTML form with "action" attribut pointing to"Page 2"
- Page 2 - post.html : HTML page with a link to "Page 3".
- Page 3 - link.html : HTML page without link.
I tried this command to download all tree pages but only "Page 1" wasdownloaded:
$ wget -r https://host/form.html


I tried "--follow-tags=form" option but the same behavior was observed.


Regards,

Florian

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Recursive downloading of pages through the "action" attributes of the following "form" tags, BERBAR Florian <=

Prev by Date: [bug #64184] Metalink tests fail with gnupg >= 2.4.1
Next by Date: [bug #64082] wget unescapes URLs used as CSS url() parameters, leading to spaces and thus invalid CSS
Previous by thread: [bug #64184] Metalink tests fail with gnupg >= 2.4.1
Next by thread: [bug #64082] wget unescapes URLs used as CSS url() parameters, leading to spaces and thus invalid CSS
Index(es):
- Date
- Thread