bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Problem during using of GNU wget


From: Pawel Wojciech Glod
Subject: Problem during using of GNU wget
Date: Tue, 16 Jul 2024 06:52:14 +0000

Hello GNU support,

I am an employee of CERN and one of my tasks is web scraping the internal pages 
of our organisation. To do this, I use wget to download the entire directory 
structure of the website along with the HTML files.

I have a problem with websites whose top-level domain (TLD) is ".cern".
An example page is https://openlab.cern/
According to our documentation, it does not require cookies or a session token. 
Unfortunately, a single HTML file is downloaded containing only the code of the 
home page. Are you able to diagnose why this is happening? Perhaps the website 
has additional security features or it requires a session token or cookies.

My second question concerns the issue of when we need to download cookies and 
the session token. We have our own tool for this, but how do we take into 
account redirecting to another authentication page using wget so that after 
authentication, the wget command works correctly? What url address should be 
included?

 I would appreciate a prompt reply.

Best regards, Pawel Glod
CERN, BE-CSS


reply via email to

[Prev in Thread] Current Thread [Next in Thread]