[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Problem during using of GNU wget
From: |
Pawel Wojciech Glod |
Subject: |
Problem during using of GNU wget |
Date: |
Tue, 16 Jul 2024 06:52:14 +0000 |
Hello GNU support,
I am an employee of CERN and one of my tasks is web scraping the internal pages
of our organisation. To do this, I use wget to download the entire directory
structure of the website along with the HTML files.
I have a problem with websites whose top-level domain (TLD) is ".cern".
An example page is https://openlab.cern/
According to our documentation, it does not require cookies or a session token.
Unfortunately, a single HTML file is downloaded containing only the code of the
home page. Are you able to diagnose why this is happening? Perhaps the website
has additional security features or it requires a session token or cookies.
My second question concerns the issue of when we need to download cookies and
the session token. We have our own tool for this, but how do we take into
account redirecting to another authentication page using wget so that after
authentication, the wget command works correctly? What url address should be
included?
I would appreciate a prompt reply.
Best regards, Pawel Glod
CERN, BE-CSS
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Problem during using of GNU wget,
Pawel Wojciech Glod <=