bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Downloading a web page's html code: Wget vs. Chrome's "Save Page WE"


From: michel . kempeneers
Subject: RE: Downloading a web page's html code: Wget vs. Chrome's "Save Page WE" extension
Date: Tue, 7 Jan 2020 01:28:17 +0100 (CET)

Hi Taylor, 

thx for your feedback. 

-eBay: 
you may have a point about eBay, I don't know. 
But even in that case, it doesn't make sense to me. 

- UserAgent: 
I don't really know what this means, but yes, why not? 
According to 

[ https://www.whatsmyua.info/ | https://www.whatsmyua.info/ ] 

this would be my UA: 

Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/75.0.3770.80 Safari/537.36 

But when I add the following U-A key 

--user-agent="Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36" 

nothing much changes. 
The Wget file gets a few more lines (around 2330, which is the same in the 
heavy Chrome version --- I forgot to point that out in my previous message) 

- Java: 
I'm not a developer, alas. At best I can trick my way around a bit with batch 
files. 
So I'm bound hands & feet to whatever CL freeware is out there. 

So no real improvement, I'm afraid... 

M. 



Van: "Taylor" <address@hidden> 
Aan: "Michel Kempeneers" <address@hidden> 
Cc: "bug-wget" <address@hidden> 
Verzonden: Vrijdag 3 januari 2020 14:16:05 
Onderwerp: Re: Downloading a web page's html code: Wget vs. Chrome's "Save Page 
WE" extension 



Wget to download the html code of this eBay page 
Downloading a web page's html code: Wget vs. Chrome's "Save Page WE" extension 
Is there a reason why Wget only seems to find a minimal version of the code 
(or maybe the correct question is: why the html file which is 
saved by that extension, is so much taller?) 



Some ideas: 
- it is probably __NOT__ WGET that steals ca 90% of content 
- eBay maybe explicitly supports Chrome ("Best viewed in any 
browser as long as it is the latest version of Chrome or IE") 
- user agent (try to set it to Chrome's value) 
- JawaScript (little to do, the extension can do 
much JawaScript magic that WGET obviously can't) 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]